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In the Specification 

Please amend the specification by changing the word "step" on page 30 line 32 to -means--. 
A marked up paragraph showing the correction is just below. 

Marked up paragraph from page 30: 

The CL-F region and covering markers are for a species and the one or more individuals are members 
of the species. Means for determining information on the presence or absence of each allele of each 
bi-allelic marker of the group in chromosomal DNA includes any means of determination. Means for 
determining information on the presence or absence of each allele of each bi-allelic marker of the 
group in chromosomal DNA includes means comprising oligonucleotide technology by using a set of 
oligonucleotides that is complementary to the group as discussed below. Information on the presence 
or absence of each allele in the chromosomal DNA is obtained using a DNA specimen from each of 
one or more individuals of the sample or by using one or more DNA pools of DNA specimens from two 
or more individuals of the sample. Any apparatus that obtains genotype data or sample allele 
frequency data (similar to the data of the step d) of process #1) by determining the presence or 
absence of each allele of each bi-allelic marker of the group in the chromosomal DNA of one or more 
individuals is an example of this version of the invention. Versions of this apparatus also obtain a 
combination of genotype data and sample allele frequency data similar to the data of the step d) of 
process #1 . The details of step means b) will be clear to those of ordinary skill in the art. 

In the Specification (continued) 

Please amend the specification, on page 6 line 10 insert the words - - Muller-Mysok & Abel (1997) 
independently made a similar observation, but they emphasized the weakness of TDT power when the 
m/p ratio departs from unity and 8 is not close to 5 max .~ 

A marked up paragraph showing the correction is just below. 
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Marked up paragraph from page 6: 

published. 11 In this paper a general framework for determining the power of the TDT in many different 
situations is presented. The analysis of Risch and Merikangas 8 and others is shown by the inventor to 
be a special case of his general framework. His observations and calculations published in this paper 
have shown that the TDT has increased power in more common, less optimal situations as well as the 
less common, optimal situation cited by Muller-Myshok and Abel. 9 As opposed to the observation of 
Muller-Myhsok and Abel, the inventor's calculations indicate that association tests such as the TDT 
have increased power in typical situations even when the ratio m/p departs significantly from unity and, 
or the linkage disequilibrium between the analyzed (marker) allele and disease polymorphism is only 
half its maximum possible value. The inventor arrived at these conclusions independently and did not 
derive them from others. Muller-Mvsok & Abel (1997) independently made a similar observation, but 
thev emphasized the weakness of TDT power when the m/p ratio departs from unity and 5 is not close 

tQ_6maxi 
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Remarks 



The amendment on page 6 is the incorporation of matter from the last paragraph of 
page 166 of the inventor's paper. This paper is incorporated by reference into the 
patent application and is cited in foot note 1 1 on page 6. 



The amendment on page 30 is the correction of an error that a person of ordinary 
skill in the art would immediately recognize as an error. 



Also included herewith is new evidence/arguments regarding patentability. It is given 
below. 



New evidence/arguments 

The applicants intend to submit new claims that depend from claims for which the applicants 
have received a Notice of Allowance (dated 18 April 2003). The applicants intend to submit 
these new claims before the expiration of the 3 month time period constituting the requested 
period for limited Suspension of Action under 37CFR 1.103(c). 

It is also the wish of the applicants to pursue apparatus claims in future prosecution. Also 
enclosed are copies of eight publications. These copies of publications are being sent to the 
USPTO as evidence of the patentability of apparatus claims/embodiments and of the support 
for such apparatus claims/embodiments in the patent application. More specifically the 
Examiner stated in the last paragraph of page 4 of the Final Office Action dated 02 OCT 
2002 that should the applicants wish to pursue apparatus claims in future prosecution that 
concrete examples are needed in the specification in order for "means plus function" 
language to be interpreted and searched. Therefore these eight publications are being 
provided as evidence in support of patentability. 

Each of these publications is cited in the specification of the patent application and is 
incorporated by reference into the application. More specifically see some of the concrete 
examples described on p. 24 lines 1 to 2, p. 29 lines 28 to 30 and p. 34 lines 3 to 18, of the 
patent application. Each of these sections of the application (and the associated publications 
in the endnotes) describe some concrete examples of technology in the patent application for 
the interpretation of "means plus function" language. These sections of the application also 
refer to publications in the endnotes. The publications in the endnotes are incorporated by 
reference into the patent application. 

The copies of eight publications enclosed herein are the publications listed below: 
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1) Weighing DNA for Fast Genetic Diagnosis. Science, March 27, 1998, vol. 279, pp. 2044- 
2045. 

2) Accessing Genetic Information with High-Density DNA Arrays , Mark Chee, et al. Science, 
vol 274, Oct. 25, 1996 , pp. 610 - 614. 

3) Genetic analysis of amplified DNA with immobilized sequence- specific oligonucleotide 
probes, Saiki,et al. Proc Natl Acad Sci USA vol 86, pp.6230-6234. 

4) Allele-specific enzymatic amplification of (3-globin genomic DNA for diagnosis of sickle 
cell anemia , Wu, et al., Proc Natl Acad Sci USA vol 86 pp 2757-2760. 

5) Automated DNA diagnostics using an Elisa-based oligonucleotide ligation assay , 
Nickerson, et al., Proc Natl Acad Sci USA vol 87, pp. 8923-8927. 

6) Padlock Probes: Circularizing Oligonucleotides for Localized DNA Detection , Science, 
Sept. 30, 1994, vol. 265, pp. 2085-2088. 

7) SNP attack on complex traits . Nature Genetics, Nov. 1998, vol. 20 no. 3, pp. 217-218. 

8) Large Scale Identification, Mapping, and Genotvping of Single-Nucleotide Polymorphisms 
in the Human Genome, Wang, et. al., Science, May 15, 1998, vol 280, pp. 1077-1081. 

More specifically, publication 1) is an example of a technology that uses mass 
spectrometry, specifically MALDITOF, for genotyping. Such a technology is a 
nonlimiting concrete example of a technology that is described in the application and 
is used by apparatus versions of the invention. And this concrete example allows for 
interpretation of means plus function language. 

Publication 2) is an example of a technology that uses oligonucleotides for 
gentotyping, specifically high-density DNA arrays. Such a technology is a nonlimiting 
concrete exampie of a technology that is described in the application and is used by 
apparatus versions of the invention. A similar technology is described in publication 
8), specifically genotyping chips. And each of these nonlimiting concrete example 
allows for interpretation of means plus function language. 

Other examples of technologies that use oligonucleotides in various ways for 
genotyping, for example using PCR or other kinds of hybridization reactions, are 
described in the other publications enclosed herewith. Each of these technologies is 
a nonlimiting concrete example of a technology that is described in the application 
and is used by apparatus versions of the invention. And each of these nonlimiting 
concrete examples allows for interpretation of means plus function language. 

There are other similar publications cited (and incorporated by reference) in the 
application that are not enclosed herewith. And this discussion is not necessarily 
exhaustive. No technology cited herein is admitted to being prior art with respect to 
the invention by its mention or discussion in this submission. 



Sincerely, 




Robert O. McGinnis, Agent of Record, Reg. No. 44, 232 



Micro 
beads 



The San Diego researchers were looking 
for a way to help the right peg find its hole, 
and they settled on DNA. The chemical 
bases that make up DNA — cytosine, gua- 
nine, adenine, and thymine — will bind to 
each other only in particular pairings: C with 
G and A with T. Hence, a single strand made 
up of the bases ATI "J GC will bind strongly 
with its complementary strand, TAAACG, 
and not with any other sequence. 
The researchers set out to exploit 
this selectivity by attaching short 
complementary strands of DNA to 
the pegs and substrate to help the 
devices find their correct positions. 

In their first experiment, the 
team coated a substrate with a par- 
ticular short strand of DNA. They 
then covered parts of the substrate 
with a mask and exposed it to ultra- 
violet light. The light chemically 
altered the DNA in exposed areas 
so that it could no longer bind to 
complementary strands. The re- 
searchers then coated some microbeads — 
which acted as dummy devices — with strands 
of DNA complementary to those on the 
substrate. When a fluid carrying the coated 
beads was splashed over the substrate, the 
beads successfully bound only to those areas 
that had not been exposed to UV light. One 
drawback of the technique is that it worked 



only for small devices, several hundred mi- 
crometers across, that would flow easily and 
not block other devices. 

In a second experiment, designed to show 
that several varying kinds of "devices" could be 
deposited at once, the group used masks to 
deposit four different types of DNA strands 
onto a substrate and then attached comple- 
mentary strands to four different fluorescent 



Microbead 




.Coated silicon substrate 



Nature's glue. DNA strands bind beads and substrate together, 



molecules. When the labeled molecules were 
splashed onto the substrate, the pattern of fluo- 
rescence showed that they had bound only 
to the appropriate regions of complementary 
DNA. In a real system, this would mean that 
four completely different types of devices could 
be attached to many selected sites on a chip. 
The researchers realize, however, that just 



providing the glue is not going to be enough. 
They are now looking for more active ways to 
guide the devices to their correct positions. 
One possibility is to add extra chemical groups 
to the DNA on the devices to give them an 
electric charge, then create electric fields on 
the substrate to attract the charged devices 
to "landing sites." The team is also inves- 
tigating other techniques, such as creating 
currents in the fluid that 
would sweep the tiny de- 3 
vices to the right places, fc 
An even bigger chal- z 
lenge will be creating an 8 
electrical connection be- 
tween the devices and their 
host semiconductor. The 
team is looking at the possi- 
bility of putting the DNA 
glue on the top of devices 
and bonding them, upside 
down, onto a dummy sub- 
strate. Once all the devices 
are in position, the dummy 
could be flipped over and pressed down on the 
real substrate. The substrate might be coated 
with molten solder, which would add an elec- 
trical bond to the mass marriage. 

-Sunny Bains 

Sunn) 1 Bains is a science writer based in the San 
Francisco Bay area. 
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Weighing DNA for Fast Genetic Diagnosis 



The modern doctor's little black bag, al- 
ready overflowing with high-tech diagnostic 
devices, may soon have to make room for 
another advance. To diagnose a disease, 
judge future risks, or design a treatment, 
doctors will one day want to know which 
disease-related genes a patient carries. And 
they will want this diagnostic verdict to be as 
fast and accurate as a cholesterol or blood 
chemistry test today. As Charles Cantor, di- 
rector of Boston University's Center for Ad- 
vanced Biotechnology, puts it: "You need a 
detection system that can identify the gene 
sequences that you are looking for with high 
specificity, quickly, and in large volumes. 
The best analytical tool for doing this," he 
adds, "is mass spectrometry." 

Borrowed from chemistry, this technol- 
ogy is a sharp departure from current meth- 
ods, which identify a gene sequence by allow- 
ing it to bind to a matching probe, either on 
a gel or a chip. Instead, a mass spectrometer 
vaporizes the DNA and accelerates the mol- 
ecules through a vacuum chamber with the 
help of an electric field. Tiny differences in 
the time it takes the DNA fragments to reach 
the detector reveal small differences in their 
mass, and hence their sequence. 



The basic technique used for biomole- 
cules is one with an unwieldy name, matrix- 
assisted laser desorption/ionization-time-of- 
flight mass spectrometry, but a harmonious 
acronym, MALDI-TOF. It is now a decade 
old, but recent improvements have made it 
a hot commodity among companies 
hoping to commercialize DNA analysis. 
"With today's technology, MALDI- 
TOF can analyze hundreds of DNA 
samples ... in a matter of a few min- 
utes," says Daniel P. Little, who di- 
rects mass-spectrometry development 
at Sequenom Inc., a San Diego-based 
company hoping to be generating di- 
agnostic products within 6 months. 

The standard way to distinguish 
different variants of a gene is to chop 
the DNA into fragments, separate them 
on a gel, and apply probes labeled with 
fluorescence or radioactivity, which 
bind to fragments with a particular 
sequence and light them up. But the 
process is slow and the gels can be hard 
to interpret. Newer techniques embed 
an array of different DNA probes on a 
single chip, allowing researchers to test 
for many gene variants at once. These 



so-called DNA chips can screen DNA quickly. 
But, as Cantor explains, the probes sometimes 
bind to sequences they don't completely 
match, which can limit the chips' accuracy. 

Mass spectrometry may combine the DNA , 
chip's speed with exquisite accuracy. The 
technique has long offered chemists a fast way 
to sort small molecules that vaporize naturally 
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All in the timing. A mass spectrometer sizes up DNA 
by vaporizing and ionizing it, accelerating the mol- 
ecules, and recording their arrival times at a detector. 
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or can be coaxed into a vapor with bursts of 

- energy from a laser or ion beam. But vaporiz- 
ing large biomolecules while keeping them 
intact once seemed impossible. A decade 

. ago, however, Franz Hillenkamp and col- 
leagues at Westfalische Wilhelms University 
in Munster, Germany, found a way to do so 
with proteins: Cocrystallize them with certain 
small molecules, collectively called matrices. 
When a nanosecond laser pulse vaporizes the 
matrix, the resulting puff of material gently 
lifts the ionized biomolecule as well. 

DNA was a tougher problem. But in 1993, 
Christopher Becker, then at SRI Interna- 
, ' tional in Palo Alto, California, and now at 
, GeneTrace Systems in Menlo Park, Cali- 
. fornia, found a simple matrix compound, 
3-hydroxypicolinic acid, that worked with 
.DNA sequences 20 to 25 bases long. By trial 

• and error, MALDI practitioners have come 
up with several new matrices that work with 

■ DNA fragments as long as 100 bases. 

The latest MALDI-TOF machines allow 
the cloud of matrix molecules to dissipate 
before applying an electric field. The field 
accelerates the charged DNA fragments 
toward a detector, and the differences in 
time of flight can reveal mass differences as 
, small as 0.03%. If the DNA sequences from 
a gene have the same length — as they do if 
. they have been produced by the polymerase 
. chain reaction — any departure from the mass 
of the normal sequence reflects a mutation 

• that has deleted or added bases or substi- 
: tuted others that have a different mass. 

"The results are an absolute indicator of the 
: presence or absence of specific DNA se- 
quences," says Sequenom's Little. MALDI- 
-TOF can distinguish gene variants that 
differ by as little as a single base pair, and it 
can also analyze microsatellites — stretches 
.■ of two-, three-, or four-nucleotide repeats 
often used as markers for locating disease- 
causing genes. 

■ . Besides offering unmatched precision, 
MALDI-TOF is inherently fast. The DNA 
forms a vapor and flies to the detector in 
fractions of a second; even repeating the pro- 
cess several times with the same sample to 
boost the sensitivity takes as little as 2 sec- 
onds. By preparing the samples in a grid and 

- having the laser scan each spot in turn, a 
MALDI-TOF instrument can analyze 100 
samples or more in a matter of minutes. 

The combination of speed and accuracy 
could give the technique a role in genome 
sequencing as well as diagnosis. Standard, 
. Sanger-type DNA sequencing generates many 
partial copies of a DNA sequence, each one 
starting at one end of the sequence and ending 
with a different one of the constituent bases. 

- To determine the original sequence, biologists 
need to know the final base on each partial 
copy, together with the copy's length. Doing 
so now requires reading hundreds of bands 



on gels. But by sending the mixture through a 
mass spectrometer, biologists could quickly 
read off the fragments' lengths and — from 
the mass differences between successive frag- 
ments^ — the final base on each one. Investi- 
gators at both GeneTrace and Sequenom 
have published sequences determined with 
MALDI-TOF, the latest one, from Sequenom, 
appearing in the April Nature Biotechnology. 

For practical gene sequencing, however, 
MALDI-TOF would have to work with DNA 
fragments much longer than the current 
100-base capacity. Becker reportedly has dis- 
covered a new proprietary matrix that he 
expects will extend MALDI-TOFs reach to 
1000-base sequences. "If you can really do 
upward of 1000 bases using this technique, 
and if it is indeed faster and cheaper, then 
this would be a big breakthrough for high- 
throughput sequencing," says Jeffrey Polish, who 
works in Mark Johnston's sequencing labora- 
tory at Washington University in St. Louis. 

In the meantime, the technology has no 
shortage of applications. Sequenom has shown, 
for example, that it can discriminate among 
30 of the mutations that cause cystic fibrosis 
and pick up polymorphisms in the apolipo- 



protein E gene, which have been linked to 
familial hyperlipidemias, heart disease, and 
Alzheimer's disease. GeneTrace has developed 
a mass spectrometry-based system that can 
analyze which genes are being expressed in 
cells by identifying expressed sequence tags, 
short stretches of DNA copied from the mes- 
senger RNA made by active genes. Knowing 
which genes are active in a tissue can help 
pharmaceutical companies determine which 
ones are good drug targets. 

With MALDI-TOF instruments running 
about $125,000 each — less than a standard 
clinical chemistry analyzer — these systems 
may also end up in large diagnostic labs. "Di- 
agnostics at the level of the gene is something 
that we know is valuable, but is difficult, slow, 
and expensive today," says David Cooper, 
chief scientific officer at Nichols Institute 
Reference Laboratories, a division of Quest 
Diagnostics, one of the big 3 national refer- 
ence laboratories. MALDI-TOF, he says, could 
be just the right medicine. 

-Joseph Alper 

Joseph Alper is a free-lance writer in Boulder, 
Colorado.- 
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Making a Bigger Chill With Magnets 



LOS ANGELES — Refrigerator magnets are 
best known for holding shopping lists and old 
postcards onto refrigerator doors. But in a few 
years, much more powerful magnets could be 
the key to keeping food cold in so-called 
magnetocaloric refrigerators, which would 
be more energy efficient and less polluting 
than standard models. Now a new class of 
magnetocaloric materials, announced here 
last week at a meeting of the American Physi- 
cal Society, could make these magnetic refrig- 
erators more practical and versatile. 

The magnetocaloric effect works when 
strong magnetic fields align quantum- 
mechanical "spins" of electrons within at- 
oms. This transition reduces one' aspect of 
the randomness, or entropy, of the atoms. 
But according to laws of thermodynamics, 
some other aspect of randomness has to 
increase in compensation, so the atoms in- 
crease the randomness of their velocities — 
vibrating and heating up. Once this heat is 
carried away by a coolant such as water, the 
field is removed and the effect works in re- 
verse, chilling the material and cooling a 
refrigerator. To date, the peak performance 
has been with the element gadolinium. 

By adding various amounts of silicon and 
germanium to gadolinium's crystal lattice, 
Vitalij Pecharsky and Karl Gschneidner of 
the Ames Laboratory at Iowa State Univer- 
sity discovered a new class of materials that 
can chill two to six times further in a single 



magnetic cycle, meaning that the refrigera- 
tors could operate with weaker magnetic 
fields or less material. Depending on the 
germanium-to-silicon ratio, the new mate- 
rials also operate from about room tempera- 
ture all the way down to -253 degrees Cel- 
sius. The cold end of the range would allow 
magnetocaloric freezers to liquefy hydrogen 
or natural gas for use in clean-burning power 
plants or future automobiles. 

To come up with the new compounds, the 
team followed up on hints that magneto- 
caloric materials containing gadolinium and 
either silicon or germanium — but not both — 
prefer a different range of temperatures than 
gadolinium alone. "We're not trying to come 
up with exotic new compounds out of the 
pure blue sky," says Gschneidner. The sur- 
prise, he says, was that the magnetocaloric ef- 
fect turned out to be far larger when both ger- 
manium and silicon were added to the material. 

"These new materials give you a lot more 
flexibility in designing magnetocaloric [re- 
frigerators]," says Carl Zimm, a senior scien- 
tist in magnetic refrigeration at Astronau- 
tics Corporation of America in Madison, 
Wisconsin. The team is still working on 
mak ing enough of the material to try it out in 
Zimm's prototype gadolinium-based refrig- 
erator, which has been running for about a 
year. The test should take place "within a 
couple of months," says Gschneidner. 

-James Glanz 
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Accessing Genetic Information with 
High-Density DNA Arrays 

Mark Chee, Robert Yang, Earl Hubbell, Anthony Berno, 
Xiaohua C. Huang, David Stern, Jim Winkler, David J. Lockhart, 
Macdonald S. Morris, Stephen P. A. Fodor . 

Rapid access to genetic information is central to the revolution taking place in molecular 
genetics. The simultaneous analysis of the entire human mitochondrial genome is de- 
scribed here. DNA arrays containing up to 135,000 probes complementary to the 16.6- 
kilobase human mitochondrial genome were generated by light-directed chemical syn- 
thesis. A two-color labeling scheme was developed that allows simultaneous compar- 
ison of a polymorphic target to a reference DNA or RNA. Complete hybridization patterns 
were revealed in a matter of minutes. Sequence polymorphisms were detected with 
single-base resolution and unprecedented efficiency. The methods described are ge- 
neric and can be used to address a variety of questions in molecular genetics including 
gene expression, genetic linkage, and genetic variability. 



D-0.1 M KC), Tat-SF/pp140 was eluted with in- 
creasing salt concentrations and was detected 
mostly in 0.2 to 0.4 M KCI fractions. These fractions 
were pooled, dialyzed against buffer D-0.1 M KCI, 
and loaded onto a glutathione Sepharose (Pharma- 
cia) column containing GST-Tat fusion proteins. After 
the column was washed with buffer D-0.4 M KCI, 
Tat-SF/pp1 40 was eluted from the column with buff- 
- er D containing 1.4 M KCI. The estimated overall 
purification after these steps was -3000-fotd. In the 
experiment shown in Fig. 3, the 0.2 to 0.4 M KCI 
heparin Sepharose fraction containing Tat-SF activ- 
ity was subjected to fractionation through an Affi-Gel 
1 0 matrix column (Bio-Rad) containing immobilized 
Tat. Tat-SF activity was eluted from the column with 
increasing salt concentrations. The 0.6 M KCI frac- 
tion was analyzed as described in Fig. 3. 
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A central theme in modem genetics is the 
relation between genetic variability and phe- 
notype. To understand genetic variation and 
its consequences on biological function, an 
enormous effort in comparative sequence 
'analysis will need to be carried out. Conven- 
tional nucleic acid sequencing technologies 
make use of analytical separation techniques 
to resolve sequence at the single nucleotide 
level (J, 2). However, the effort required 
increases linearly with the amount of se- 
quence. In contrast, biological systems read, 
store, and modify genetic information by mo- 
lecular recognition (3). Because each DNA 
strand carries with it the capacity to recognize 
a uniquely complementary sequence through 
base pairing, the process of recognition, or 
hybridization, is highly parallel, as every nu- 
cleotide in a large sequence can in principle 
be queried at the same time. Thus, hybrid- 
ization can be used to efficiently analyze 
large amounts of nucleotide sequence. In one 
proposal, sequences are analyzed by hybrid- 
ization to a set of oligonucleotides represent- 
ing all possible subsequences (4). A second 
approach, used here, is hybridization to an 
array of oligonucleotide probes designed to 
match specific sequences. In this way the 
most informative subset of probes is used. 
Implementation of these concepts relies on 
recently developed combinatorial technolo- 
gies to generate any ordered array of a large 
number of oligonucleotide probes (5). 



Affymetrix, 3380 Central Expressway, Santa Clara, CA 
95051, USA. 



The fundamentals of light-directed oli- 
gonucleotide array synthesis have been de- 
scribed (5, 6). Any probe can be synthe- 
sized at any discrete, specified location in 
the array, and any. set of probes composed of 
the four nucleotides can be synthesized in a 
maximum of 4N cycles, where N is the 
length of the longest probe in the array. For 
example, the entire set of ~ 10 12 20-nucle- 
otide oligomer probes, or any desired subset, 
can be synthesized in only 80 coupling cy- 
cles. The number of different probes that 
can be synthesized is limited only by the 
physical size of the array and the achievable 
lithographic resolution (7). 

An array consisting of oligonucleotides 
complementary to subsequences of a target 
sequence can be used to determine the iden- 
tity of a target sequence, measure its amount, 
and detect differences between the target 
and a reference sequence. Many different 
arrays can be designed for these purposes. 
One such design, termed a 4L tiled array, is 
depicted in Fig. 1A. In each set of four 
probes, the perfect complement will hybrid- 
ize more strongly than mismatched probes. 
By this approach, a nucleic acid target of 
length L can be scanned for mutations with 
a tiled array containing 4L probes. For ex- 
ample, to query the 16,569 base pairs (bp) of 
human mitochondrial DNA (mtDNJA), only 
66,276 probes of the possible ~10 c) 15-nu- 
cleotide oligomers need to be used. 

The use of a tiled array of probes to read a 
target sequence is illustrated in Fig. 1C. A 
tiled array of 1 5-nucleotide oligomers varied 
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at position 7 from the 3' end (P 15,7 ) was 
designed and synthesized for mtl, a cloned 
sequence containing 1311 bp spanning the 
control region of mtDNA (8-1 I). The upper 
panel of Fig. 1C shows a portion of the fluo- 
rescence image of an array hybridized with 
fluorescein-labeled mtl RNA (12). The base 
sequence can be read by comparing the inten- 
sities of the four probes within each column. 
For example, the column for position 16,493 
consists of the four probes, 3'-TGACATAG- 
GCTGTAG, 3 ' -TG AC ATCGGCTGTAG , 
3 ^TGACATGGGCTGTAG, and 3'-TGA- 
C ATTGGCTGT AG . The probe with the 
strongest signal is the probe with the A sub- 
stitution (A, 49 counts; C, 8 counts, G, 15 
counts, and T, 8 counts, where the back- 
ground is 2 counts), identifying the base at 
position 16,493 as U in the RNA tran- 
script. Continuing the process, the se- 
quence at each position can be read directly 
from the hybridization intensities. 

The effect on the array hybridization 
pattern caused by a single base change in 
the target is illustrated in Fig. IB, and the 
detection of a single-base polymorphism is 
shown in the lower panel of Fig. 1C. The 
target was mt2, which differs from mtl in 
this region by a T-to-C transition at posi- 
tion 16,493. Accordingly, the probe with 
the G substitution (third row) displays the 
strongest signal. Because the tiled array was 
designed to complement mtl, the hybrid- 
ization intensities of neighboring probes 
that overlap position 16,493 are also affect- 
ed by the change in target sequence. The 
hybridization signals of 15 probe sets of the 
1 5-nucleotide oligomer tiled array are per- 
turbed by a single base change in the target 
sequence. In the P 15,7 array, each probe 
querying the eight positions to the left and 
six positions to the right of the polymor- 
phism contain at least one mismatch to the 
target. The result is a characteristic loss of 
signal or a "footprint" for the probes flank- 
ing a mutation position. Of the four probes 
querying each position, the loss of signal is 
greatest for the one designed to match mtl. 
We denote the subset of probes with zero 
mismatches to the reference sequence as P°. 

A comparison of P° hybridization signals 
from a target to those from a reference is 
ideally obtained by hybridizing both sam- 
ples to the same array. We therefore devel- 
oped a two-color labeling and detection 
scheme in which the reference is labeled 
with phycoerythrin (red), and the target 
with fluorescein (green) (13). By processing 
the reference and target together, experi- 
mental variability during the fragmenta- 
tion, hybridization, washing, and detection 
steps is minimized or eliminated. In addi- 
tion, during cohybridization of the refer- 
ence and target, competition for binding 
sites results in a slight improvement in mis- 



match discrimination. Array hybridization is 
highly reproducible, and comparative anal- 
ysis of data obtained from separate but iden- 
tically synthesized arrays is also effective. 

The two-color approach was tested by an- 
alyzing a 2.5-kb region of mtDNA that spans 
the tRNA Gh \ cytochrome b, tRNA Tflr , 
tRNA Pro , control region, and tRNA Phe DNA 
sequences (14)* A P 20,9 array (20-nucleotide 
oligomer probes varied at position 9 from the 
3' end) was designed to match the mtl target 
(that is, P° sequence = mtl). The mtl ref- 
erence (red) and a polymorphic target sam- 
ple (green) were pooled and hybridized si- 
multaneously to the array. Differences be- 
tween the target and reference sequences 
were identified by comparing the scaled red 
and green P° hybridization intensities (15). 
The marked decrease in target hybridization 
intensity, over a span of —20 nucleotides, is 
shown for a single-base polymorphism at po- 
sition 16,223 (Fig. 2 A). The footprint is 
enlarged when two polymorphisms occur in 
close proximity (within —20 nucleotides) 
(Fig. 2B). When polymorphisms are clus- 
tered, the size of the footprint depends on 



the number of polymorphisms and their sep- 
aration (Fig. 2C). 

To read polymorphisms accurately, we 
developed an algorithm that addresses the 
issue of multiple mismatches. The algo- 
rithm performs base identification but also 
flags regions of ambiguity caused by multi- 
ple mismatches. These regions are easily 
identified by the presence of a large foot- 
print (Fig. 2, B and C) or by two or more 
bases identified as differing from P° within 
the span of a single probe. Discrepancies 
between base identifications and footprint 
patterns are also flagged for further analysis 
(for example, a P° footprint in which no 
polymorphism is identified; such a pattern 
is typical of a deletion). Thus, base identi- 
fications are valid only for unflagged re- 
gions. In flagged regions, the presence of 
sequence differences is detected, but no at- 
tempt is made to identify the sequence 
without further analysis. 

Sequence analysis was carried out on the 
2.5-kb target from 12 samples. A total of 
30,582 bp containing 180 substitutions rela- 
tive to mtl were analyzed. Ninety-eight par- 
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Fig. 1 . (A) Design of a 4L tiled array. Each position in the 
target sequence (uppercase letters) is queried by a set 
of four probes on the chip (lowercase letters), identical 
except at a single position, termed the substitution po- 
sition, which is either A, C, G, or T (blue indicates 
complementarity, red a mismatch). Two sets of probes 
are shown, querying adjacent positions in the target. (B) 
Effect of a change in the target sequence. The probes 
are the same as in (A), but the target now contains a 
single-base substitution (base C, shown in green). The 
probe set querying the changed base still has a perfect 
match (the G probe). However, probes in adjacent sets 
that overlap the altered target position now have either 
one or two mismatches (red) instead of zero or one, 
because they were designed to match the target shown 
in (A). (C) Hybridization to a 4L tiled array and detection 
of a base change in the target. The array shown was 
designed to the mtl sequence. (Top) hybridization to 
mtl . The substitution used in each row of probes is 
indicated to the left of the image. The target sequence 
can be read 5' to 3' from left to right as the complement 
of the substitution base with the brightest signal. With 
hybridization to mt2 (bottom), which differs from mtl in 
this region by aT->C transition, the G probe at position 
1 6,493 is now a perfect match, with the other three 
probes having single-base mismatches (A 5, C 3, G 37, 
T 4 counts). However, at flanking positions, the probes 
have either single- or double-base mismatches, be- 
cause the mt2 transition now occurs away from the 
query position. 
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cent of the sequence was unambiguously as- 
signed by a Bayesian base identification al- 
gorithm {16). Of this 98%, which contained 
both wild-type sequence and a high propor- 
tion of single-base footprints such as the 
example shown in Fig. 2 A, 29,878 out of 
29,879 bp were identified correctly (17). 
The remaining 2% of the sequence, which 
contained the multiple substitution foot- 
prints (such as those shown in Fig. 2, B and 
C), was flagged for further analysts. Of the 
649 bp composing this 2%, 643 bp were 
located in or immediately adjacent to foot- 
prints (18). In all, 179 out of the 180 poly- 
morphisms were unambiguously detected, 
126 out of 127 were identified correctly in 
the unflagged regions, and 53 polymor- 
phisms occuring in the flagged regions were 
detected as footprints. There were no un- 
flagged false-positive base identifications, 
and only one false-positive footprint. These 
figures can be considered to be "worst case" 
estimates for the type of array and target 
used. The P° sequence represents a Cauca- 
sian haplotype, and our sample set included 
eight African samples having a large number 
of clustered differences to P°. Furthermore, 
the variation in the hypervariable part of the 
control region is much higher than for the 
rest of the mitochondrial genome and for 
nuclear genes in general (Fig. 2 shows com- 
parisons to African samples in this region). 



The determination of a complete human 
mitochondrial DNA sequence more than 15 
years ago has had a tremendous influence on 
studies of human origins and evolution and 
the role of mutations in degenerative diseases 
(8, JO, 19). Because of the cost and difficulty 
of conventional sequence analysis, most sub- 
sequent sequencing studies have focused only 
on two small hypervariable regions totaling 
—600 bp (9). However, access to the entire 
genome is required for a full understanding of 
the governing genetics. We therefore de- 
signed a P 2 15,1 3 tiling array for the mitochon- 
drial genome. The array contains a total of 
136,528 synthesis cells, each —35 |xm by 35 
|xm in size (Fig. 3). In addition to a 4L tiling 
across the genome, the array contains a set of 
probes representing a single-base deletion at 
every position across the genome and sets of 
probes designed to match a range of specific 
mtDNA haplotypes. Using long-range poly- 
merase chain reaction, we amplified the 16.6- 
kb mtDNA directly from genomic DNA sam- 
ples (20). Labeled RNA targets were prepared 
by in vitro transcription and hybridized to the 
array. Genomic hybridisation patterns were 
imaged in less than 10 min by a high-resolu- 
tion confocal scanner (2/). 

The hybridization pattern of a 16.6-kb 
target to the mitochondrial genome chip is 
shown in Fig. 3. Although there are some 
regions of low intensity, most of the 25- 



nucleotide oligomer array hybridized effi- 
ciently: Simply by identifying the highest 
intensity in each column of four substitu- 
tion probes, 99.0% of the mt3 sequence 
could be read correctly (P° sequence = 
mc3). The array was used to successfully 
detect three disease -causing mutations in a 
mtDNA sample from a patient with Leber's 
hereditary optic neuropathy (22, 23) (Fig. 
3C), In addition, we detected a total of 
seven errors and new polymorphisms from 
previously unsequenced regions. 

We then hybridized 10 genomes from 
African individuals to the array and unam- 
biguously identified 505 polymorphisms. 
These were polymorphisms that could be 
clearly read and for which a confirmatory 
footprint was detected automatically. For 
the 10 samples, the 2.5-kb cytochrome b 
and control region sequences were known 
(17). No false positives were detected in the 
~25 kb of sequence checked in this way. 
Additional clustered polymorphisms were 
detected by the presence of footprints but 
not read directly. A detailed analysis of the 
polymorphisms in these genomes, and oth- 
ers, will be presented elsewhere. 

The throughput of a conventional gel- 
based sequencer, with an average read 
length of 400 nucleotides and 48 lanes 
that is run twice a day, might be two 
mitochondrial genomes a day at best. In 
contrast, the throughput of the nonopti- 
mized system we describe is five chips per 
hour. Thus, 50 genomes can be read by 
hybridization in the time it takes to read 
two genomes conventionally. Further- 
more, there are significant reductions in 
sample preparation requirements because 
the entire genome is labeled in a. single 
reaction, so the cost is similar to that for a 
single sequencing reaction. Also, sequence 
reading at the level of data analysis is 
automated: The sequences can be read in a 
matter of minutes. No analytical separa- 
tions or gel preparation is needed, which 
contributes to the speed of the experi- 
ment. Although the inability to read all 
possible sequences is a weakness of the 4L 
tiled array, it is not a major limitation, 
because in practice the small number of 
ambiguities can be checked by targeted 
conventional sequencing. In particular, 
highly repetitive sequences, such as long 
runs of a single base, are presently best 
analyzed with conventional technology. 
Finally, a clear advantage to the approach 
we describe is that it is highly scalable. 
The cost, effort, and time required to an- 
alyze the entire 16.6-kb mtDNA in a sin- 
gle experiment is virtually identical to 
that required to read 2.5 kb. This provides 
a clear path to further orders-of-magnitude 
improvements in efficiency. 

High-density oligonucleotide arrays 



Fig. 2. Detection of base 
differences in a 2.5-kb 
region by comparison of 
scaled P° hybridization 
intensity patterns be- 
tween a sample (green) 
and a reference (red) se- 
quence. (A) Comparison 
of sequence ief007 to 
mt1. In the region 
shown, there is a single- 
base difference between 
the two sequences, lo- 
cated at position 16,223 
(C in mt1, T in ief007). 
This results in a "foot- 
print" spanning -20 po- 
sitions, 1 1 to the left and 
8 to the right of position 
16,223, in which the 
tef007 P° intensities are 
decreased by a factor of 

more than 1 0 on average relative to the mt1 intensities. The predicted footprint location is indicated by the 
gray bar, and the location of the polymorphism is shown by a vertical black line within the bar. The size of 
a footprint changes with probe length, and its relative position with substitution position (not shown). (B) 
Comparison of sequence ha001 to mt1 . The ha001 target has four polymorphisms relative to mt1 . The P° 
intensity pattern clearly shows two regions of difference between the targets. Each region contains two or 
more differences, because in both cases the footprints are longer than 20 positions and therefore are too 
extensive to be explained by a single-base difference. The effect of competition can be seen by comparing 
the mt1 intensities in the ief007 and ha001 experiments: The relative intensities of mt1 are greater in (B) 
where ha001 contains P° mismatches but ief007 does not. (C) The ha004 sequence has multiple 
differences to mt1 , resulting in a complex pattern extending over most of the region shown. Thus, 
differences are clearly detected. Because hybridization intensities are extremely sequence-dependent, 
each of the mitochondrial sequences can also be identified simply by its hybridization pattern. 
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Fig. 3. Human mito- 
chondrial genome on a 
chip. (A) An image of the 
array hybridized to 16.6 
kb of mitochondrial target 
RNA (L strand). The 
16,569-bp map of the 
genome is shown, and 
the H strand origin of rep- 
lication (Oh), located in 
the control region, is indi- 
cated. (B) A portion of 
the hybridization pattern 
magnified, in each col- 
umn there are five 
probes: A, C, G, T, and A, 
from top to bottom. The 
A probe has a single- 
base deletion instead of a 
substitution and hence is 
24 instead of 25 bases in 
length. The scale is indi- 
cated by the bar beneath 
the image. Although 
there is considerable se- 
quence-dependent in- 
tensity variation, most of 
the array can be read di- 
rectly. The image was 
collected at a resolution 
of - 1 00 pixels per probe 
ceil. (C) The ability of the 
array to detect and read 

single-base differences in a 1 6.6-kb sample is illustrated. Two different target sequences were hybridized 
in parallel to different chips. The hybridization patterns are compared for four different positions in the 
sequence. Only the p 25 > 13 probes are shown. The top panel of each pair shows the hybridization of the mt3 
target, which matches the chip P° sequence at these positions. The lower panel shows the pattern 
generated by a sample from a patient with Leber's hereditary optic neuropathy (LHON). Three known 
pathogenic mutations, LHON 3460, LHON 42 16, and LHON13708, are clearly detected. For comparison, 
the fourth panel in the set shows a region around position 1 1 ,778 that is identical in both samples. 




provide the foundation for a powerful ge- 
netic analysis technology. The method 
can be used to characterize the spectrum 
of sequence variation in a population and 
can be applied to the analysis of many 
genes in parallel. In the case of human 
mtDNA, we simultaneously analyzed the 
control region, 13 protein coding genes, 
22 tRNA genes, and 2 ribosomal RNA 
genes. The methods described here can be 
applied to other research areas in molec- 
ular genetics; for example, the ability to 
identify and sequence polymorphisms pro- 
vides a basis for genetic mapping. The 
specificity of oligonucleotide hybridiza- 
tion and the scalability of the method 
suggests the possibility of a dedicated array 
that could be used to generate a high- 
resolution genetic map of an entire ge- 
nome in a single experiment. Likewise, 
the concepts and techniques described 
here have been used to develop approach- 
es for mRNA identification and the large- 
scale, parallel measurement of expression 
levels (24). Thus, the sequence of a gene, 
its spectrum of change in the population, 
its chromosomal location, and its dynam- 



ics of expression (all essential to a full 
understanding of function) can be deter- 
mined with high-density probe arrays. The 
challenge now is to synthesize and read 
probe arrays at even higher density. For 
example, a 2 cm by 2 cm array, synthesized 
with probes occupying l-|xm synthesis 
sites in a 4L tiling, could query the entire 
coding content of the human genome, 
estimated at 100,000 genes. 
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RNA polymerase (1 U/jxl) (Promega) in a reaction 
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the 5' end with fluorescein (5' -CTGAACGGTAG- 
CATCTTGAC). Samples were denatured at 95°C for 
5 min, chilled on ice for 5 min, and equilibrated to 
37°C. A volume of 1 80 p.l of hybridization solution was 
then added to the flow cell [R. Lipshutz etal. , Biotech- 
niques 1 9, 442 (1 995)] and the chip incubated at 37°C 
for 3 hours with rotation at 60 rpm. The chip was 
washed six times at room temperature with 6X SSPE 
(0.9 M NaCI, 60 mM NaH 2 P0 4 , 6 mM EDTA, pH 7.4), 
0.005% Triton X-100. Phycoerythrin-conjugated 
streptavidin (2 \ug/m\ in 6X SSPE, 0.005% Triton 
X-1 00) was added and incubation continued at room 
temperature for 5 min. The chip was washed again 
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and scanned at a resolution of -74 pixels per probe 
cell. Two scans were collected: a fluorescein scan 
was obtained with a 51 5- to 545-nm band-pass filter, 
and a phycoerythrin scan with a 560- nm long -pass 
filter. Signals were separated to remove spectral over- 
lap and average counts per cell determined. 

14. Each 2.5- kb target sequence was PCR -amplified di- 
rectly from genomic DNA with the primer pair 
L1 4675-T3 (5' -aattaaccctcactaaagggATTCTCG- 
CACGGACTACAAC) and H667-T7 (1 1). 

1 5. To scale the sample to the reference intensities, we 
constructed a histogram of the base 1 0 logarithm of 
the intensity ratios for each pair of probes. The his- 
togram had a mesh size of 0.01 and was smoothed 
by replacing the value at each point with the average 
number of counts over a five- point window centered 
at that point. The highest value in the histogram was 
located, and the resulting intensity ratio was taken to 
be the most probable calibration coefficient. 

16. Base identification was accomplished with a Bayes- 
ian classification algorithm based on variable kernel 
density estimation. The likelihood of each identifica- 
tion associated with a set of hybridization intensity 
values was computed by comparing an unknown set 
of probes to a set of example cases for which the 
correct base identification was known: The resulting 
four likelihoods were then normalized so that they 
summed to 1 . Data from both strands were com- 
bined by averaging the values. If the most likely base 
identification had an average normalized likelihood 
greater than 0.6, it was called, otherwise the base 
was called as an ambiguity. The example set was 
derived from two different samples, ib013 and 
ief005, which have a total of 35 substitutions relative 
to mt1 , of which 1 9 are shared with the 1 2 samples 
analyzed and 16 are not. Identification performance 
was not sensitive to the choice of examples. 

17. To provide an independently determined reference 
sequence, each 2.5-kb PCR amplicon was se- 
quenced on both strands by primer-directed fluores- 
cent chain-terminator cycle sequencing with an ABI 
373A DNA sequencer and assembled and manually 
edited with Sequencher 3.0. The analysis presented 
here assumes that the sequence amplified from 
genomic DNA is essentially clonal [R. J. Monnat and 
L. A. Loeb, Proc. Natl. Acad Sci. U.S.A. 82. 2895 
(1985)] and that its determination by gel-based 
methods is correct. A frequent length polymorphism 
at positions 303 to 309 was not detected by hybrid - 

. ization under the conditions used. It was excluded 
from analysis and is not part of the set of 1 80 poly- 
morphisms discussed in the text. However, poly- 
morphisms at this site have previously been differen- 
tiated by oligonucleotide hybridization [M. Stone- 
king, D. Hedgecock, R. G, Higuchi, L. Vigilant, H. A. 
Erlich, An. J. Hum. Genet 48, 370 (1991)]. 

18. The P° intensity footprints were detected in the fol- 
lowing way: The reference and sample intensities 
were normalized (75), and R,. the average of 

'Ogt^mterence^sampte) 0Ver 3 WindOW Of five pOSi- 

tions, centered at the base of interest, was calculat- 
ed for each position in the sequence. Footprints 
were detected as regions having at least five contig- 
uous positions with a reference or sample intensity at 
least 50 counts above background and an R value in 
the top 10th percentile for the experiment. At 205 
polymorphic sites, where the sample was mis- 
matched to P°. the mean R value was 1 .01, with a 
standard deviation of 0.57. At 35,333 nonpolymor- 
phic sites (that is, where both reference and sample 
had a perfect match to P°) the mean value was 
-0.05, with a standard deviation of 0.25. 

1 9. R. L Cann, M. Stoneking, A. C. Wilson, Nature 325, 31 
(1987); M. Zeviani et a/., Am. J. Hum. Genet. 47, 904 
(1990): D. C. Wallace, Annu. Rev. Biochem, 61. 1175 
(1992); S. Horai. K. Hayasaka, R, Kondo, K. Tsugane, 
N. Takahata, Proc. Natl. Acad Sci. U.S.A. 92, 532 
(1995); T. Hutchin and G. Cortopassi, ibid., p. 6892. 

20. Long-range PCR amplification was carried out on 
genomic DNA with Perkin- Elmer GeneAmp XL PCR 
reagents according to the manufacturer's protocol. 
Primers were L1 4836-T3 (5'-aattaaccctcactaaagggAT- 
GAAACTTCGGCTCACTCCT TGGCG) and RH1066- 
T7 {5'-taatacgactcactatagggaTTTCATCATGCGGA- 
GATGT TGGATGG) . based on RH 1066 [S. Cheng, R. 



Higuchi, M. Stoneking. Nature Genet. 7. 350 (1994)]. 
Each 100-m-I reaction contained 0.2 jjlM concentration 
of each primer and ^10 to 50 ng of total genomic 
DNA. Transcription reactions were carried out in 10 jxl 
with Ambion MAXIscript kit according to the manufac- 
turer's protocol. The concentration of the 1 6.6-kb PCR 
template was —2 nM, and the reaction contained Am- 
bion 1X biotin-14-CTP/NTP mix and 0.2 mM biotin- 
16-UTP. Incubation was at 37*C for 2 hours. Frag- 
mentation and hybridization were as described (73), 
except that 3.5 M TMACI and the biotin-tabeled oligo- 
nucleotide 5 ' -CTGAACGGTAGCATCTTGAC were 
used in the hybridization buffer, which also contained 
fragmented baker's yeast RNA (1 00 |xg/ml) (Sigma). 
Hybridization was carried out. at 40°C for 4 hours. 
21. A custom telecentric objective lens with a numerical 
aperture of 0.25 focuses 5 imW of 488- nm argon laser 
light to a 3-n.m-diameter spot, which is scanned by a 
galvanometer mirror across a 14-mm field at 30 lines 
per second. Fluorescence collected by the objective is 
descanned by the galvanometer mirror, filtered by a 
dichroic beamsplitter (555 nm) and a band-pass filter 
(555 to 607 nm), focused onto a confocal pinhole, 
and detected by a photomultiplier. Photomultiplier 
output is digitized to 12 bits. A 4096 by 4096 pixel 
image is obtained in less than 3 min. Pixel size is 3.4 
\Lfn. The data from four sequential scans were 
summed to improve the signal-to-noise ratio. 



1 he nucleosome has an active role in gene 
regulation. Mutations of the core histones 
have specific consequences for the tran- 
scription of particular genes (I). The spec- 
ificity of these effects can he explained both 
by the positioning of histories with respect 
to DNA sequence (2) and the potential 
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Donald. D. C. Wallace. FASEB J. 6, 2791 (1992). 

23. Mitochondrial DNA populations can contain more than 
one sequence type, in a condition known as hetero- 
plasmy. The LHON mutations shown in Fig. 3C were 
characterized as being homoplasmic by conventional 
sequencing and restriction endonuclease digestion (M, 
Brown, personal communication), fn controlled mixing 
experiments, we have shown that sequences present at 
the level of 1 0% can easily be detected by hybridization 
(M. Chee and R. Yang, unpublished results; N. Shen, 
personal communication). The sensitivity of detection is 
sequence dependent. Importantly, hybridization can be 
used to detect heterozygous nuclear DNA sequences 
(J. Hacia et a/., in preparation). 
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targeting of histone modifications to partic- 
ular nucleosomes (3). Thus, an understand- 
ing of nucleosomal architecture is central to 
understanding the transcription process. 

The nucleosome contains two molecules 
of each of the four core histones (H2A, H2B t 
H3, and H4)> a single molecule of a linker 
histone (HI, Hl°, or H5), and -180 base 
pairs (bp) of DNA (4). In isolation, the core 
histones assemble into an octameric complex 
(5) >o whose structure has been determined at 
3.1 A resolution (6-8). The exact path of 
DNA on the surface of the histone octamer, 
the position of the linker histone molecule 
within the nucleosome, and the path of linker 
DNA between adjacent nucleosomes (9-11) 
remain to be determined. 

• We used positioned nucleosomes con- 
taining the Xenopus borealis somatic 5S ri- 
bosomal RNA (rRNA) gene to examine 
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DNA Gyres 
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Histone-DNA contacts within a nucleosome influence the function of trans-acting factors 
and the molecular machines required to activate the transcription process. The internal 
architecture of a positioned nucleosome has now been probed with the use of photo- 
activatable cross-linking reagents to determine the placement of histones along the DNA 
molecule. A model for the nucleosome is proposed in which the winged-helix domain of 
the linker histone is asymmetrically located inside the gyres of DNA that also wrap around 
the core histones. This domain extends the path of the protein superhelix to one side of 
the core particle. 



614 



SCIENCE • VOL. 274 • 25 OCTOBER 1 996 



Proc. Natl Acad, Sci. USA 

Vol. 86, pp. 6230-6234, August 1989 

Genetics 



Genetic analysis of amplified DNA with immobilized 
sequence-specific oligonucleotide probes 

(polymerase chain reactlon/"reverse dot blots''/nonradioactive detection/ HIA-DQ A Iocus/0-thalassemia) 

Randall K. Saiki*, P. Sean Walsh*, Corey H. Levenson^, and Henry A. Erlich* 

Departments of *Human Genetics and ^Chemistry, Cetus Corp., 1400 Fifty-Third Street, Emeryville, CA 94608 
Communicated by Hamilton O. Smith, May 9, 1989 (received for review March 2, 1989) 



ABSTRACT The analysis of DNA for the presence of 
particular mutations or polymorphisms can be readily accom- 
plished by differential hybridization with sequence-specific 
oligonucleotide probes. The in vitro DNA amplification tech- 
nique, the polymerase chain reaction (PCR), has facilitated the 
use of these probes by greatly increasing the number of copies 
of target DNA in the sample prior to hybridization* In a 
conventional assay with immobilized PCR product and labeled 
oligonucleotide probes, each probe requires a separate hybrid- 
ization. Here we describe a method by which one can simul- 
taneously screen a sample for all known allelic variants at an 
amplified locus. In this format, the oligonucleotides are given 
homopolymer tails with terminal deoxyribonudeotidyltrans- 
ferase, spotted onto a nylon membrane, and covalently bound 
by UV irradiation. Due to their long length, the tails are 
preferentially bound to the nylon, leaving the oligonucleotide 
probe free to hybridize. The target segment of the DNA sample 
to be tested is PCR-amplified with biotinylated primers and 
then hybridized to the membrane containing the immobilized 
oligonucleotides under stringent conditions. Hybridization is 
detected nonradioactively by binding of streptayidm-horserad- 
ish peroxidase to the biotinylated DNA- followed by a simple 
colorimetric reaction. This technique has been applied to HIA- 
DQA genotyping (six types) and to the detection of Mediterra- 
nean /3-thaIassemia mutations (nine alleles). 

Differential hybridization with sequence-specific oligonucle- 
otide probes has become a widely used technique for the 
detection of genetic mutations and polymorphisms (1-5). 
When hybridized under the appropriate conditions, these 
synthetic DNA probes (usually 15-20 bases in length) will 
anneal to their complementary target sequences in the sample 
DNA only if they are perfectly matched. In most cases, the 
destabilizing effect of a single base-pair mismatch is sufficient 
to prevent the formation of a stable probe-target duplex (6). 
With an appropriate selection of oligonucleotide probes, the 
relevant genetic content of a DNA sample can be completely 
described. 

This very powerful method of DNA analysis has been 
greatly simplified by the in vitro DNA-amplification tech- 
nique, the polymerase chain reaction (PCR) (7-9). The PCR 
can selectively increase the number of copies of a particular 
DNA segment in a sample by many orders of magnitude. As 
a result of this 10 6 - to lGP-fold amplification, more convenient 
assays and nonradioactive detection methods have become 
possible (10-12). These PCR-based assays are usually done 
by amplifying the target segment in the sample to be tested, 
fixing the amplified DNA onto a series of nylon membranes, 
and hybridizing each membrane with one of the labeled 
oligonucleotide probes Under stringent hybridization condi- 
tions. However, each probe must still be individually hybrid- 
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ized to the amplified DNA and the process can easily become 
difficult in a system where many different mutations or 
polymorphisms occur. 

One approach to address this procedural difficulty is to 
* 'reverse" the DNAs: attach the oligonucleotides to the 
nylon support and hybridize the amplified sample to the 
membrane. Thus, in a single hybridization reaction, an entire 
series of sequences could be analyzed simultaneously. The 
strategy we adopted was to immobilize the oligonucleotides 
onto nylon filters by ultraviolet fixation. Exposure to UV 
light activates thymine bases in DNA, which then covalently 
couple to the primary amines present in nylon (13). It seemed 
unlikely, however, that short oligonucleotides could be di- 
rectly attached to nylon in this manner and still retain their 
ability to discriminate at the level of a single base-pair 
mismatch. Consequently, the addition of a long deoxyribo- 
thymidine homopolymer tail, poly(dT), to the 3' end of the 
oligonucleotide appeared promising for several reasons. 
First, the poly(dT) tail would be a larger target for UV 
crosslinking and should preferentially react with the nylon. 
Second, dTTP is very readily incorporated onto the 3' ends 
of oligonucleotides by terminal deoxyribonucleotidyltrans- 
ferase and would permit the synthesis of very long tails (14). 
(Deoxyribothymidine would also be the most efficiently 
incorporated base if a purely synthetic route were chosen.) 
Third, Collins and Hunsaker (15) had shown that the pres- 
ence of a poly(dA) homopolymer tail, used to introduce 
multiple 35 S labels, did not affect the function of sequence- 
specific oligonucleotide probes. 

We have used this technique to attach oligonucleotide 
probes specific for the six major HLA-DQA DNA types (16) 
and the eight most common Mediterranean ^thalassemia 
mutations (4) to nylon filters. The target segment of the DNA 
sample to be tested (either HLA-DQA or 0-gIobin) was 
amplified by PGR with biotin-labeled primers to introduce a 
nonradioactive tag. Hybridization of the amplified product to 
the immobilized oligonucleotides and binding of streptavidin- 
horseradish peroxidase conjugate to the biotinylated primers 
were performed simultaneously. Detection was accom- 
plished by a simple colorimetric reaction involving the en- 
zymatic oxidation of a colorless chromogen that yielded a red 
color wherever hybridization occurred. 

MATERIALS AND METHODS 

Tailing of Oligonucleotides. Oligonucleotides were synthe- 
sized on a DNA synthesizer (model 8700, Biosearch) with 
/J-cyanoethyl AW-diisopropylphosphoramidite nucleosides 
(American Bionetics, Hayward, CA) by using protocols 
provided by the manufacturer. Oligonucleotide (200 pmol) 
was tailed in 100 /tl of 100 mM potassium cacodylate/25 mM 
Tris'HCl/1 mM CoCi 2 /0.2 mM dithiothreitoi, pH 7.6 (17), 
with 5-160 nmol deoxyribonucleoside triphosphate (dTTP or 

Abbreviation: PCR, polymerase chain reaction. 
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dCTP) and 60 units (50 pmol) of terminal deoxyribonucleoti- 
dyltransferase (Ratliff Biochemicals, Los Alamos, NM) for 
60 min at 37°C. Reactions were stopped by addition of 100 /tl 
of 10 mM EDTA. The lengt hs of the homopolymer tails were 
controlled by limiting dTTP or dCTP. For example, a nominal 
tail length of 400 dT residues was obtained by using 80 nmol 
of dTTP in the above reaction. 

Preparation of Filters. The tailed oligonucleotides were 
diluted into 100 /il of TE (10 mM Tris'HCl/0.1 mM EDTA, 
pH 8.0) and applied to a nylon membrane (Genetrans-45; 
Plasco, Woburn, MA) with a spotting manifold (BioDot; 
BioRad). The damp filters were then placed on TE-soaked 
paper pads in a UV light box (Stratalinker 1800; Stratagene) 
and irradiated at 254 nm. Dosage was controlled by the 
device's internal metering unit. The irradiated membranes 
were washed in 200 ml of 5 X SSPE (lx SSPE is 180 mM 
NaCl/10 mM NaH 2 P0 4 /l mM EDTA, pH 7.2) with 0.5% 
NaDodS0 4 for 30 min at 55°C to remove unbound oligonu- 
cleotides. If not used immediately, the filters were rinsed in 
water, air-dried, and stored at room temperature until 
needed. 

Amplification of DNA. PCR amplification of genomic se- 
quences was performed by a slight modification of previously 
described procedures (9). DNA (0.1-0.5 fig) was amplified in 
100 m1 containing 50 mM KC1, 10 mM Tris-HCl (pH 8.4), 1.5 
mM MgCl 2l 10 /tg of gelatin, 200 fiM each dATP, dCTP, 
dGTP, and dTTP, 0.2 pM each biotinylated amplification 
primer, and 2.5 units of Thermits aquaticus (Taq) DNA 
polymerase (Perkin-Elmer/Cetus). The cycling reaction was 
done in a programmable heat block (DNA Thermal Cycler; 
Perkin-Elmer/Cetus) set to heat at 95°C for 15 sec (denature) , 
cool at 55°C for 15 sec (anneal), and incubate at 72°C for 30 
sec (extend) by the "Step-Cycle" program. After 30 repeti- 
tions, the samples were incubated an additional 5 min at 72°C. 
The primers contained a single molecule of biotin attached to 
the 5' end of the oligonucleotides (described below). 

Hybridization and'Detection of Amplified DNA. Each filter 
with bound oligonucleotides was placed in 4 ml of hybrid- 
ization solution containing 5x SSPE, 0.5% NaDodS0 4 , and 
400 ng of streptavidin-horseradish peroxidase conjugate 
(SeeQuence; Eastman Kodak). PCR-amplified DNA (20 fi\) 
was denatured by addition of an equal volume of 400 mM 
NaOH/10 mM EDTA and added immediately to the hybrid- 
ization solution, which was then incubated at 55°C for 30 min. 
(During this incubation, hybridization of PCR product to 
immobilized oligonucleotide and binding of streptavidin- 
horseradish peroxidase to biotin present in the PCR product 
occur simultaneously.) The filters were briefly rinsed twice in 
2x SSPE/0.1% NaDodS0 4 at room temperature, washed 
once in 2x SSPE/0.1% NaDodS0 4 at 55°C for 10 min, and 
then briefly rinsed twice in 2x PBS (lx PBS is 137 mM 
NaCl/2.7 mM KC1/8 mM Na 2 HP0 4 /L5 mM KH2PO4, pH 
7.4) at room temperature. Color development was performed 
by incubating the filters in 25-50 ml of red leuco dye 
(Eastman Kodak) at room temperature for 5-10 min. Photo- 
graphs were taken for permanent records. 

Synthesis of Biotinylated Oligonucleotide Primers. Primary 
amino groups were introduced at the 5' termini of the primers 
by a variation of published procedures (18, 19). In brief, 
tetraethylene glycol was converted to the monophthalimido 
derivative by reaction with phthalimide in the presence of 
triphenylphosphine and diisopropyl azodicarboxylate (20). 
The monophthalimide was converted to the corresponding 
/J-cyanoethy 1 diisopropylamino phosphoramidite by standard 
protocols (21). The resulting phthalimido amidite was added 
to the 5' ends of the oligonucleotides during the final cycle of 
automated DNA synthesis by using standard coupling con- 
ditions. During normal deprotection of the DNA (concen- 
trated aqueous ammonia for 5 hr at 55°C), the phthalimido 
group was converted to a primary amine, which was subse- 



quently acylated with an appropriate biotin active ester. 
NHS-LC-biotin (Pierce) was selected for its water solubility 
and lack of steric hindrance. The biotinylation was performed 
on crude, deprotected oligonucleotide, and the mixture was 
purified by a combination of gel filtration and reversed-phase 
HPLC. Additional details of this procedure will be published 
elsewhere (22). 

RESULTS 

Binding and Hybridization Efficiency of Tailed Oligonucle- 
otides. The relative efficiencies with which synthetic oligo- 
nucleotides with homopolymer tails of various lengths were 
covalently bound to the nylon filter were measured as a 
function of UV exposure (Fig. 1 Left). Oligonucleotides with 
longer poly(dT) tails were more readily fixed to the mem- 
brane, and all attained their maximum values by 240 mJ/cm 2 
of irradiation at 254 nm. In contrast, the (dC^-tailed oligo- 
nucleotide required more irradiation to crosslink to the nylon 
and was not comparable to the equivalent (dT) 40 o construct 
even after 600 mJ/cm 2 exposure. This difference is consistent 
with the findings of Church and Gilbert (13) that suggested 
light-activated thymine bases bind more effectively to nylon 
than do cytosine bases. The untailed oligonucleotide was also 
retained by the membrane in a manner that roughly paralleled 
the poly(dC) product. 

Efficient binding of oligonucleotides to the membrane, 
however, does not necessarily correlate with hybridization 
efficiency, and so hybridization efficiency as a function of 
UV dosage was determined in a separate experiment (Fig. 1 
Right). These results show a distinct optimum of exposure 
that changes with the length of the poly(dT) tail and is more 
sharply pronounced for the longer tails. Additional experi- 
ments have shown the optimal dosages to be about 20 mJ/cm 2 
for the (dT) 80 o and 40 mJ/cm 2 for the (dT) 400 oligonucleotides 
(R.K.S.. unpublished observations) = The peak efficiencies of 
the (dT)4oo and (dT) 8 oo constructs are around 1% (45-50 fmol 
of radiolabeled probe annealed to ^3.5 pmol of tailed oligo- 
nucleotide), which is similar to the value reported by Gamper 
et aL (23) for an oligonucleotide probe hybridized to nylon- 
bound plasmid DNA. 

Comparison of the data in Fig. 1 Left and Right for 60 
mJ/cm 2 irradiation indicates that oligonucleotides with 
longer tails hybridize more effectively than can be accounted 
for by the additional amounts bound to the filter. This 
suggests a spacer effect wherein the poly(dT) tails improve 
hybridization efficiency by increasing the distance between 
the nylon membrane and the terminal oligonucleotide probe. 
Besides possible UV damage to the DNA itself, additional 
exposure causes more of the tail to become attached to the 
membrane, thus reducing the average spacer length and 
decreasing hybridization efficiency. The markedly different 
hybridization profile of the poly(dC) oligonucleotide is com- 
patible with this interpretation. Because cytosines react less 
efficiently with the filter, hybridization efficiency reaches a 
plateau where loss due to UV damage and tail shortening are 
compensated by the fixing of new molecules (see Fig. 1 Left). 
This characteristic of cytosine may make a poly(dC) tail 
desirable when U V irradiation cannot be carefully controlled . 
Under the stringent hybridization conditions used in this 
experiment, no signal was detected for the untailed oligonu- 
cleotide. 

DNA Typing at the HLA-DQA Locus. The HLA-DQA test is 
derived from a PCR-based oligonucleotide typing system that 
partitions the polymorphic variants at the DQA locus into 
four major DNA types, DQA1 to DQA4, and three DQA1 
subtypes* DQA1.1 to DQA1.3 (16). Four oligonucleotides 
specific for the major DQA types, four oligonucleotides that 
characterize the DQA1 subtypes, and one control oligonu- 
cleotide that hybridizes to all allelic DQA sequences (Table 
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Fig. 1. Filter retention and hybridization efficiency of tailed oligonucleotides as a function of UV dosage and tail length. (Left) Filter 
retention. A 19-base oligonucleotide, 19A (5 1 -CTCCTGAGG AGAAGTCTGC-3'), was 5'-end-labeled with 32 P by using phage T4 polynucleotide 
kinase and [y- 32 P)ATP (10). Portions of the labeled oligonucleotide were given 3' homopolymer tails with terminal deoxyribonucleotidyltrans- 
ferase and either dTTP or dCTP. The base compositions and lengths of the tails were as follows: (dT)o. (dT) 2 5, (dT>5 0 , (dT)ioo, (dT)2oo» (dTW 
(dT) 80 o, and (dC>4oo- Four picomoles of each oligonucleotide was spotted onto nine duplicate filters, UV irradiated for various times, and washed 
to remove unbound oligonucleotides; each spot then was measured by scintillation counting to determine the amount crosstinked to the nylon. 
The values plotted are relative to an unirradiated, unwashed control filter (100% retention). (Right) Hybridization efficiency. Filters containing 
tailed, but unlabeled, 19A were prepared as described above and hybridized under sequence-specific conditions (see Materials and Methods) 
with a 32 P-labeIed 40-base oligonucleotide, RS24 (5 '-CCCACAGGGCAGTAAGGGCAG ACTTCTCCTCAGG AGTC AG-3 complementary to 
ioa tKp cnprifir ac*« v »* u «f th^ v^a %yas 1 5 iiCi/nmo! (1 uCi = 37 RBo^. Each soot was a 
are fmol of RS24 hybridized to the membrane. 



1) were given 400-base poly(dT) tails and spotted onto nylon 
filters. The sequence variation that defines the DQA types is 
localized within a relatively small "hypervariable" region of 
the second exon (24) that can be encompassed within a single 
242-base-pair PCR amplification fragment. Biotinylated 
primers (Table 1) were used to amplify the DQA fragment 
from several genomic DNA samples: six homozygous cell 
lines and six heterozygous individuals. After hybridization of 
the amplified DNA to the membranes and color development, 
the DQA genotypes of these samples were readily apparent 
(Fig. 2). 

Table 1. Sequences of oligonucleotide primers and probes 



Although most of the oligonucleotide probes are uniquely 
specific for one DQA type, two of the DQA1 subtyping 
probes cross-hybridize to several DNA types. GH89 hybrid- 
izes to a sequence common to the DQAL2, DQA1.3, and 
DQA4 types, and the probe GH76 detects all DQA types 
except DQA13. (The latter is needed to distinguish DQA1.2/ 
1.3 heterozygotes from DQA1J/1J homozygotes.) The 
length and strand specificity of the oligonucleotides were 
empirically adjusted until their relative hybridization efficien- 
cies and stringency requirements for allelic discrimination 
were approximately the same. (This was achieved by deter- 



Name* 



Function 



Sequence 



RS151 DQA primer b-GTGCTGCAGGTGTAAACTTGTACCAG^ 

RS152 DQA primer b-CACGGATCCGGTAGCAGCGGTAGAGTTG^ 

RH54 (2) All DQA types CTACGTGGACCTGGAGAGGAAGGAGACTGCCTG 

GH75(4) DQA I probe CTCAGGCCACCGCCAGGCA 

RH71 (4) DQA2 probe TTCCACAGACTTAGATTTGAC 

GH67 (4) DQA3 probe TTCCGCAGATTTAGAAGAT 

GH66(4) DQA4 probe TGTTTGCCTGTTCTCAGAC 

GH88 (4) DQAIA probe CGTAGAACTCCTCATCTCC 

GH89(4) DQA1.2, -13, -4- GATGAGCAGTTCTACGTGG 

GH77 (4) DQA J. 3 probe CTGGAGAAGAAGGAGAC 

GH76 (4) Not DQA 1.3 GTCTCCTTCCTCTCCAG 



Name* Function Sequence 

RS151 0-GIobin primer b-ATCACTTAGACCTCACCCTG^ 

RS152 £-Globin primer b^ACCTCCCACATrCCCTTTT+ 

RS187 (8) Normal 0 l uo TAGACCAATAGGCAGAGAG 

RS188 (8) Mutant p l m CTCTCTGCCTATTAGTCTA 

RS87 (4) Normal fP CCTTGGACCCAGAGGTTCT 

RS89 (4) Mutant 0 W AGAACCTCTAGGTCCAAGG 

RS189(0.33) Normal £ M < 6 CTTGATACCAACCTGCCCA* 

RS190(0.33) Mutant TGGGCAGGTTGGCATCAAG 

RS191 (1) Mutant 0 M TGGGC AG ATTGGT ATC AAG 

RS192 (4) Normal 0 1 ' 1 CCATAGACTCACCCTGAAG 

RS193 (4) Mutant 0 lx CTTCAGGATGAGTCTATGG 

RS201 (2) Normal fP 14S GCAGAATGGTAGCTGGATT 

RS202 (2) Mutant 0 2145 GCAGAATGGTACCTGGATT 

RS196 (4) Normal fit* ACTCCTGAGGAGAAGTCTG* 

RS197 (4) Mutant 0> GACTCCTGGGAGAAGTCTG 

RS198(4) Mutant TGACTCCTGAGGAGGTCTG 



* Where applicable, the values in parentheses indicate the amount (pmol) of tailed oligonucleotide probe applied to the nylon membrane. 
*b, Biotin covalently attached to 5' end. 

*These 0-gIobin oligonucleotide probes each span two sites of potential ^-thalassemia mutations and are specific for normal sequences at both 
positions. 
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Fig. 2. DNA typing at the HLA-DQA locus. Each tailed oligo- 
nucleotide probe was spotted onto 12 duplicate membranes, irradi- 
ated at 40 mJ/cm 2 , hybridized with amplified DQA sequences in 
genomic DNA samples, and treated for color development. The 
specificity of each immobilized oligonucleotide is given at the top, 
and the DQA genotype of each sample is noted at the right. The name, 
amount applied to the membrane, specificity, and sequence of each 
oligonucleotide are listed in Table 1. 

mining the optimal hybridization conditions for each member 
of an initial set of probes, then shortening or lengthening each 
oligonucleotide until they ail hybridized under equivalent 
conditions.) These eight probes produce a unique hybridiza- 
tion pattern for each of the 21 possible DQA diploid combi- 
nations. 

Detection of ^-Thalassemia Mutations. Although there are 
>54 characterized mutations of the £-globin gene that can 
give rise to ^-thalassemia (25), each ethnic group in which 
this disease is prevalent has a limited number of common 
mutations (4, 26, 27). In Mediterranean populations, 8 mu- 
tations are responsible for >90% of the ^-thalassemia alleles 
(4). Oligonucleotides were synthesized that are specific for 
each of these 8 mutations as well as their corresponding 
normal sequences (Table 1). The oligonucleotides were given 



(dT)4oo tails with terminal transferase and applied to mem- 
branes. Since the ^-thalassemia mutations are distributed 
throughout the £-globin gene, biotinylated PCR primers that 
amplify the entire gene in a single 1780-base-pair fragment 
were used. (This amplification product encompasses all 
known 0- thalassemia mutations, not only the predominant 
Mediterranean mutations examined here.) After hybridiza- 
tion and color development, the 0-globin genotypes could be 
determined by noting the pattern of hybridization (Fig. 3). 

Unlike theDgA typing system, two oligonucleotide probes 
are needed to analyze each mutation — one specific for the 
normal sequence and one specific for the mutant sequence — 
in order to differentiate normal/mutant heterozygous carriers 
from mutant/mutant homozygotes. A complicating factor in 
this analysis is caused by apparent secondary structure in 
various portions of the relatively long 0-globin amplification 
product that interferes with oligonucleotide hybridization. 
The relatively high stringency needed to minimize this sec- 
ondary structure requires the use of longer (e.g., 19-base) 
oligonucleotide probes. Because this constraint would not 
permit varying the length of the oligonucleotides to compen- 
sate for different hybridization efficiencies, the "balancing" 
of signal intensities was accomplished by adjusting the 
amount of each oligonucleotide applied to the membrane. 
This was done by applying various amounts of each oligo- 
nucleotide onto a membrane and then, after hybridization and 
color development, simply selecting the positive spots that 
had similar intensity. 

DISCUSSION 

These studies have demonstrated the feasibility of immobi- 
lizing sequence-specific probes onto nylon membranes and 
hybridizing PCR-amplified, biotin-labeled genomic frag- 
ments to the filters to determine the genetic content of the 
DNA sample. We have applied this method to HLA-DQA 
genotyping and to the detection of ^-thalassemia mutations. 
Although the number of probes used in the two tests were 
modest (9 for DQA and 14 for 0- thalassemia), expanding the 
analyses to include even more oligonucleotides should not be 
difficult. 

The recently described technique of simultaneous ampli- 
fication of several DNA fragments, "multiplex" PCR (28), 
should readily permit the concurrent analysis of multiple 
genetic loci. Using the immobilized-probe format, we have 
been able to simultaneously amplify and type at three loci: the 
//wdlll polymorphism in the °y-globin gene (29), the Ava II 
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Fio. 3. Detection of ^-thalassemia 
mutations. Various amounts of each 
tailed oligonucleotide probe were applied 
to 12 duplicate nylon filters, irradiated at 
40 mJ/cm 2 , hybridized with amplified 
/J-globin sequences in genomic DNA 
samples, and treated for color develop- 
ment. The ^-thalassemia locus that is 
detected by each immobilized oligonucle- 
otide pair is given at the top of the filters. 
For each filter, the upper row contains 
the oligonucleotide probes that are spe- 
cific for the normal sequence and the 
lower row contains the oligonucleotides 
specific for the mutant sequences. The 
0-globin genotype of each sample is 
noted at the right. The name, amount 
applied to the membrane, specificity, and 
sequence of each oligonucleotide are 
listed in Table 1. IVS, intervening se- 
quence (intron). 
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polymorphism in the low density lipoprotein receptor gene 
(30), and the HLA-DQA gene (R.K.S., unpublished obser- 
vations). Other genetic targets whose analysts would be 
simplified by this technique include the detection of somatic 
mutations in the RAS genes, where 6 loci and 66 possible 
alleles occur (31), some of the HLA class II /J-chain genes, 
where as many as 25 alleles can be detected (T. Bugawari, S. 
Scharf, and H.A.E., unpublished observations), and /?- 
thalassemia in Middle Eastern populations, where in addition 
to the endogenous mutations, Mediterranean and Asian In- 
dian mutations are present at significant frequencies (H. 
Kazazian, personal communication). This format should also 
prove useful for the detection of infectious pathogens or for 
environmental surveys of microorganisms by immobilizing a 
panel of species-specific probes. 

The ability to label probes and detect their hybridization 
without radioactivity is a convenient feature of PCR-based 
DNA tests and, perhaps more importantly, makes this type 
of analysis feasible in areas where radioactive labeling re- 
agents are difficult to obtain. In this report, a biotin tag was 
introduced into the PCR products by means of 5'-biotinylated 
primers. An alternative labeling strategy based on the incor- 
poration of biotinylated dUTP (32) has also been tried and 
shown to be very effective (R.K.S., unpublished observa- 
tions). 

One of the prerequisites of this analytical method is that all 
of the bound oligonucleotides must be sequence-specific 
under the same hybridization conditions. If necessary, this 
requirement can probably be met either by adjusting the 
length, position, and strand specificity of the probes, as was 
done for the HLA-DQA assay, or by varying the amount 
applied to the membrane, as was done for the ^-thalassemia 
assay. The presence of tetramethylammonium chloride in the 
hybridization buffer can also serve to minimize the differ- 
ences among immobilized oligonucleotides caused by vary- 
ing base compositions (ref, 33; T. Bugawan, personal com- 
munication). 

Although it may entail some initial effort, the end result is 
a simple, robust, and potentially automatable system that can 
be completed (amplification, hybridization, and color devel- 
opment) in 3-4 nr. "Reverse dot blots** should be particu- 
larly valuable for assays where the number of potential 
sequence variations exceeds the number of samples to be 
tested. Even in situations where the number of samples and 
probes are approximately equal, the immobilized-probe for- 
mat may be preferable since many filters can be prepared at 
one time and stored until needed. To date, this typing system 
has been used to determine the HLA-DQA genotype of >300 
unknown samples in forensic and disease-susceptibility stud- 
ies. 
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Goda, D. Spasic, and C.-A. Chang for synthesis of oligonucleotides, 
C. Perez for advice on terminal transferase tailing reaction, C. 
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PCR primers and 0-thalassemia genomic DNA samples, S. Warren 
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T. White, D. Gelfand, and H. Kazazian for critical review of the 
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ABSTRACT A rapid nonradioactive approach to the di- 
agnosis of sickle cell anemia is described based on an allele- 
specific polymerase chain reaction (ASPCR). This method 
allows direct detection of the normal or the sickle cell 0-globin 
allele in genomic DNA without additional steps of probe 
hybridization, ligation, or restriction enzyme cleavage. Two 
allele-specific oligonucleotide primers, one specific for the 
sickle cell allele and one specific for the normal allele, together 
with another primer complementary to both alleles were used 
in the polymerase chain reaction with genomic DNA templates. 
The allele-specific primers differed from each other in their 
terminal 3' nucleotide. Under the proper annealing tempera- 
ture and polymerase chain reaction conditions, these primers 
only directed amplification on their complementary allele. In a 
single blind study of DNA samples from 12 individuals, this 
method correctly and unambiguously allowed for the determi- 
nation of the genotypes with no false negatives or positives. If 
ASPCR is able to discriminate all allelic variation (both 
transition and transversion mutations), this method has the 
potential to be a powerful approach for genetic disease diag- 
nosis, carrier screening, HLA typing, human gene mapping, 
forensics, and paternity testing. 

Sickle cell anemia is the prototype of a genetic disease caused 
by a single base-pair mutation, an A -+ T transversion in the 
sequence encoding codon 6 of the human 0-gJobin gene. In 
homozygous sickle cell anemia, the substitution of a single 
amino acid (Glu -> Val) in the j8-globin subunit of hemoglobin 
results in a reduced solubility of the deoxyhemoglobin 
molecule and erythrocytes assume irregular shapes. The 
sickled erythrocytes become trapped in the microcirculation 
and cause damage to multiple organs. 

Kan and Dozy (1) were the first to describe the diagnosis 
of sickle cell anemia in the DNA of affected individuals based 
on the linkage of the sickle cell allele to an Hpa I restriction 
fragment length polymorphism. Later, it was shown that the 
mutation itself affected the cleavage site of both Dde I and 
Mst II and could be detected directly by restriction enzyme 
cleavage (2, 3). Conner et al. (4) described a more general 
approach to the direct detection of single nucleotide variation 
by the use of allele-specific oligonucleotide hybridization. In 
this method, a short synthetic oligonucleotide probe specific 
for one allele only hybridizes to that allele and not to others 
under appropriate conditions. 

All of the above approaches are technically challenging, 
require a reasonably large amount of DNA, and are not very 
rapid. The polymerase chain reaction (PCR) developed by 
Saiki et al. (5) provided a method to rapidly amplify small 
amounts of a particular target DNA. The amplified DNA 
could then be readily analyzed for the presence of DNA 
sequence variation (e.g. , the sickle cell mutation) by allele- 
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specific oligonucleotide hybridization (6), restriction enzyme 
cleavage (5, 7), ligation of oligonucleotide pairs (8, 9), or 
ligation amplification (10). PCR increased the speed of 
analysis and reduced the amount of DNA required for it but 
did not change the method of analysis of DNA sequence 
variation. In this paper, we investigated whether PCR could 
be done in an allele-specific manner such that the presence or 
absence of an amplified fragment provides direct determina- 
tion of genotype. 

PCR utilizes two oligonucleotide primers that hybridize to 
opposing strands of DNA at positions spanning a sequence of 
interest. A DNA polymerase [either the Klenow fragment of 
Escherichia coli DNA polymerase I (5) or Thermus aquaticus 
DNA polymerase (11)1 is used for sequential rounds of 
template-dependent synthesis of the DNA sequence. Prior to 
the initiation of each new round, the DNA is denatured and 
fresh enzyme is added in the case of the £. coli enzyme. In 
this manner, exponential amplification of the target se- 
quences is achieved. We reasoned that if the 3 ' nucleotide of 
one of the primers formed a mismatched base pair with the 
template due to the existence of single nucleotide variation, 
amplification would take place with reduced efficiency. 
Specific primers would then direct amplification only from 
their homologous allele. After multiple rounds of amplifica- 
tion, the formation of an amplified fragment would indicate 
the presence of the allele in the initial DNA. 

MATERIALS AND METHODS 

Oligonucleotide Synthesis. Oligonucleotides were synthe- 
sized on an Applied Biosystems 380B DNA synthesizer by 
the phosphoramidite method. They were purified by electro- 
phoresis on a urea/polyacrylamide gel followed by high- 
performance liquid chromatography as described (12). 

Source and Isolation of Human DNA. All genomic DNA 
samples with the exception of the ^-thalassemia DNA were 
isolated from the peripheral blood of appropriate individuals. 
The /3-globin genotype of these individuals was previously 
determined by hybridization with allele-specific oligonucle- 
otide probes (4) as well as by hemoglobin electrophoresis. 
Thalassemia major DNA was obtained from an Epstein-Barr 
virus-transformed lymphocyte cell line obtained from the 
National Institute of General Medical Sciences Human Ge- 
netic Mutant Cell Repository (Camden, NJ). Thalassemia 
DNA was isolated from the cultured cells. All DNA prepa- 
rations were performed according to a modified Triton X-100 
procedure followed by proteinase K and RNase A treatment 
(13). The average yield of genomic DNA was **25 fig per ml 
of blood. 

PCR. H014A (5'-CACCTGACTCCTGA) and BGP2 (5'- 
AATAGACCAATAGGCAGAG) at a concentration of 0.12 
pM were used as the primer set for the amplification of the 
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normal 0-globin gene (a primer set). Similarly, 0.12 jiM 
H/314S (5'-CACCTGACTCCTGT) and 0.12 /tM BGP2 were 
used as the primer set for the amplification of the sickle ceil 
gene (s primer set). Both primer sets directed the amplifica- 
tion of a 203-base-pair (bp) 0-globin allele-specific fragment. 
As an internal positive control, all reaction mixtures con- 
tained an additional primer set for the human growth hor- 
mone gene comprised of 0.2 /tM GHPCR1 (5'- TTCCCAAC- 
CATTCCCTTA) and 0.2 /xM GHPCR2 (5'-GGATTTCTGT- 
TGTGTTTC) (hGH primer set). GHPCR1 and GHPCR2 
direct the amplification of a 422-bp fragment of the human 
growth hormone gene. All reactions were performed in a vol 
of 50 pi containing 50 mM KC1, 10 mM Tris-HCl (pH 8.3), 1.5 
mM MgCl 2 , 0.01% (wt/vol) gelatin, template DNA (0.5 
/ig/ml), and 0.1 mM each dATP, dCTP, dCT^, and TTP. 
Reactions were carried out for 25 cycles at ah annealing 
temperature of 55°C for 2 min, a polymerization temperature 
of 72°C for 3 min, and a heat-denaturation temperature of 
94°C for 1 min on a Perkin-Elmer Cetus DNA thermal cycler. 
At the end of the 25 rounds, the samples were held at 4°C in 
the thermal cycler until removed for analysis. 

Analysis of the PCR Products. An aliquot (15 /tl) from each 
of the completed PCR reactions was mixed with 5 ^1 of 5 x 
Ficoll loading buffer (lx = 10 mM Tris-HCl, pH 7.5/1 mM 
EDTA/0.05% bromophenol blue/0.05% xylene cyanol/3% 
Ficoll) and subjected to electrophoresis in a 1.5% agarose gel. 
Electrophoresis was performed in 89 mM Tris-HCl/89 mM 
borate/2 mM EDTA buffer for 2 hr at 120 V. At the 
completion of electrophoresis, the gel was stained in ethidium 
bromide (1.0 fxg/m\) for 15 min, destained in water for 10 min, 
and photographed by ultraviolet trans-illumination. 

RESULTS 

Experimental Design. The scheme describing allele-specific 
PCR (ASPCR) is shown in Fig. 1. Primer PI is designed such 
that it is complementary to allele 1 but the 3 '-terminal 
nucleotide forms a single base-pair mismatch with the DNA 
sequence of allele 2 (Fig. 12?, *). Under appropriate annealing 
temperature and PCR conditions, there is normal amplifica- 
tion of the P1-P3 fragment with DNA templates containing 
allele 1 (homo- or heterozygous), while there is little or no 
amplification from DNA templates containing allele 2. In a 
similar way, a primer (P2) could be designed that would allow 
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Fig. 1. Schematic representation of the ASPCR. PI and P3, 
synthetic oligonucleotide primers that anneal to opposing strands of 
a single copy gene. PI anneals to the region of a gene in the region 
of a DNA sequence variation such that its terminal 3' nucleotide base 
pairs with the polymorphic nucleotide of the template. PI is com- 
pletely complementary to allele 1 (A) but forms a single base-pair 
mismatch with allele 2 at the 3'-terminal position due to one or more 
nucleotide differences relative to allele 1{B). 



the specific PCR amplification of allele 2 but not allele 1 
DNA. 

We designed two 14-nucleoti de-long allele-specific prim- 
ers, H/314S and H014A, complementary to the 5' end of the 
sickle cell and normal 0-globin genes, respectively. The 
oligonucleotide primers differ from each other by a single 
nucleotide at the 3' end, H/314S having a 3' T and H014A 
having a 3' A corresponding to the base pair affected by the 
sickle cell mutation. The oligonucleotide primer BGP2 (7) 
complementary to the opposite strand 3' of the allele-specific 
primers was used as the second primer for PCR. The 
amplification product with these primer pairs was 203 bp. 
Also included in each reaction was a second pair of primers 
that directed the amplification of a 422-bp fragment of the 
human growth hormone gene. These primers were included 
as an internal positive control. 

Discrimination Between the Normal and Sickle Cell Alleles. 
Genomic DNA was isolated from peripheral blood leuko- 
cytes of individuals of known /5-globin genotypes (P A /p A , 
P A /P S * P s /P s )- In addition, DNA was isolated from an 
Epstein-Barr virus-transformed cell line containing a ho- 
mozygous deletion of the /3-globin gene (£ th /0 th ). DNA was 
subjected to 25 rounds of PCR using either the sickle 
cell-specific primer set (Hj814S and BGP2) or the normal 
gene-specific primer set (H014A and BGP2) using an anneal- 
ing temperature of 55°C. The results are shown in Fig. 2A. It 
can be seen that a 203-bp fragment is observed using the 
sickle cell-specific primer set only with the P A /p s and p s /p s 
genomic DNA templates and not with the p A /p A genomic 
DNA templates. Conversely, the normal gene-specific prim- 
er set only gave rise to an amplification product with £ A /0 S 
and p A /p A genomic DNA templates. As expected, the 
thalassemia DNA did not give rise to a £-globin gene 
amplification product with either primer set. The internal 
growth hormone gene control gave rise to a 422-bp fragment 
in all samples, demonstrating that in no case was the absence 
of a globin-specific band due to a failure of the PCR. 

In a single blind study, the DNA from 12 individuals with 
different /3-globin genotypes was analyzed with the two 
primer sets. The results are shown in Fig. 2B. Individuals 1, 
2, 3, and 5 are predicted to be p A /p A ; individuals 6, 9, 10, and 
11 are predicted to be p s /fiP; and individuals 4, 7, 8, and 12 
are predicted to be p A /p *. In each case, the genotype was 
correctly and unambiguously predicted from the pattern of 
fragment amplification (see legend to Fig. 2 for clinically 
diagnosed genotype). 

DISCUSSION 

The results presented above indicate the potential usefulness 
of ASPCR for sickle cell diagnosis. The method is rapid and 
the result is obtained without the use of radioactivity, since 
all that is required is to visualize the band on a gel with 
ethidium bromide staining. It should be possible to further 
improve the technique by elimination of the gel separation 
step. One strategy for this is shown in Fig. 3. As proposed 
recently by Yamane et al. (15), the two primers for the PCR 
could be labeled differently, one with biotin and one with a 
fluorescent group such as fluorescein or tetramethyl rhoda- 
mine. The product of the PCR could be captured on strepta- 
vidin-agarose and the presence of the amplified sequence 
could be detected with the fluorescence. In this case, if one 
allele-specific primer were labeled with one fluorescent group 
and the other were labeled with a different one, then the 
ASPCR could be done simultaneously. 

In this study, we have used PCR primers that form either 
an A*A or a T*T mismatch. It is not clear that other 
mismatches will give equally effective discrimination. Since 
G-T mismatches are more stable than other mismatches (16), 
G-T should probably be avoided when designing primers. 
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Fig. 2. (A) Identification of the normal (0 A ) and the sickle cell 
(0) alleles by ASPCR. Normal (0 A /0 A ), homozygous sickle cell 
(fP/fP), heterozygous sickle cell (/jV/} s ), and homozygous p- 
thalassemia {0^(0*) DNA samples (0.5 /tg each) served as template 
using either the normal (a primer set) or the sickle cell (j primer set) 
for the ASPCRs. As an internal positive control, all reaction mixtures 
contained an additional primer set for the human growth hormone 
gene (hGH primer set) that directed the amplification of a 422-bp 
fragment of the human growth hormone gene. After amplification, 15 
yX from each reaction mixture was subjected to electrophoresis in a 
1.5% agarose gel for 2 hr at 120 V. Ethidium bromide staining of the 
agarose gel was used to detect PCR amplified fragments. Positive 
0-globin ASPCR can be identified by the presence of a 203-bp 
fragment using either the a or the s primer set reaction. As a marker 
for the globin-specific fragment, 0.3 fig of plasmid pHj8 A containing 
the normal human globin gene (/3 A ) was amplified with the a primer 
set alone (Ml). As a marker for the growth hormone-specific 
fragment, 0.1 /tg of plasmid pXGH5 containing a 3.8-kilobase 
fragment of the human growth hormone gene (14) was amplified with 
the growth hormone primer set (hGH) alone (M2). (B)A single blind 
trial using ASPCR to diagnose the 0-gIobin genotype of genomic 
DNA samples. Genomic DNA samples from 12 individuals (4 each 
of normal, homozygous, and heterozygous sickle cell individuals) 
were randomly assigned numbers 1-12 by the hematology laboratory 
and blinded to the investigators. ASPCR was performed using both 
the normal (a) and the sickle cell-specific (s) primer sets as described 
above. Genotypes were identified as homozygous normal (0 A /0 A ) if 
the single 203-bp fragment appears exclusively in the a primer set 
reaction, as homozygous sickle cell (0 S /0 S ) if the 203-bp fragment 
appears only in the s primer set, or as heterozygous sickle cell trait 
(fir/fP) if the fragment appears in both reactions. The genotypes of 
these DNA samples were previously determined by hemoglobin 
electrophoresis (results not shown). The genotypes of the 12 indi- 
viduals are as follows: 1, 2, 3, and 5, /? A /0 A ; 6, 9, 10, and 11, p A /pP; 
4, 7, 8, and 12, f/fF, 

This can be done by designing the primer so that it is 
complementary to the strand with which it forms an A*C 
mismatch. It may be possible to use a competition approach, 
as we have previously used to improve the discrimination 
provided by oligonucleotide hybridization probes (17). In this 
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Fig. 3. Schematic representation of a dual labeling system 
suitable for the detection of the ASPCR products. One of the 
oligonucleotide primers is labeled at the 5' end with a fluorescent 
group such as fluorescein or tetramethyl rhodamine (L) and the other 
primer is labeled with biotin (B). The ASPCR amplification product 
would therefore have the 5' end labeled on both strands. The biotin 
is suitable for capturing the amplified fragment on a streptavidin- 
agarose column, while the fluorescent group is suitable for measuring 
the amount of fragment produced. ■ 

case, a competitive primer could be designed that was not 
able to prime, for example, by including in it a 3' dideoxy- 
nucleotide or a 3' ribonucleotide that has been oxidized. A 
mixture of a labeled allele-specific primer complementary to 
allele 1 plus an unlabeled priming-defective primer comple- 
mentary to allele 2 should then allow the specific amplifica- 
tion of allele 1. 

The ability of an oligonucleotide to prime on a DNA 
template is governed by two kinetic variables: the rate at 
which the annealed primer dissociates from thp template 
before initiating polymerization (r off ) and the rate at which the 
DNA polymerase extends the primer (rpol). Efficient priming 
in PCR should take place whenever rpoi > /"off, the addition of 
the first few nucleotides to the primer then greatly stabilizing 
the oligonucleotide-template complex and allowing contin- 
ued extension of the primer. For a given primer is an 
intrinsic property of the polymerase. Studies with \fr co/i 
DNA polymerase I have suggested that this polymerase may 
be able to discriminate between primers that either do or do 
not form a mismatch with the template at the 3'-terminal 
nucleotide (18). In this case, /poi for the mismatched primer 
was slower than rpoi for the perfectly matched primer. For the 
present study, we designed the allele-specific primers such 
that the allele-specific nucleotide in the template was com- 
plementary to the 3'-terminal nucleotide of the primer! In this 
way, the 3' nucleotide of the primer specific for one allele 
would form a mismatch with the other alleje. This design 
allows one to take advantage of the difference between /poi of 
the perfectly matched and mismatched primers as well as to 
optimize primer concentration, priming temperature, primer 
length, and primer sequence, all of which will affect the 
difference in the r off for the two allele-specific primers. 

We reasoned that a set of conditions should exist such that 
Jpoi > foff for the perfectly matched primer, while rpoi < 'off 
for the mismatched primer. The results shown here clearly 
demonstrate this to be true. In our study, the allele-specific 
primers were 14 nucleotides long. We found (data not shown) 
that discrimination between the 0 A and 0 s alleles was not 
possible at low annealing temperatures- (e.g., 44°C and 50°C). 
Presumably the short length of the oligonucleotides as well as 
the high annealing temperature combined to provide the 
discrimination. 

Tag polymerase is well suited for using A$PCR for the 
discrimination of two alleles that differ by a single nucleotide 
because it lacks a 3' -r* 5' exonuclease activity (19). Such an 
activity would correct the mismatched base pair in the 
mismatched primer-template complex anp* then permit effi- 
cient priming with the one-nucleotide-shorter primer. Since 
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the specificity of the ASPCR is determined in the initial 
several cycles of PCR, the fact that the primer remains 
uncorrected enhances the discrimination of the reaction. 
PCR is an exponential reaction; the yield of product is very 
dependent on the efficiency of each round (5). Only very 
minor changes in the efficiency of each round of amplification 
have profound effects on the overall yield after many rounds. 
For example, if the efficiency of the reaction with the 
perfectly matched primer is 90% and with the mismatched 
primer is 60%, there would be 73-fold more product produced 
in the reaction with perfectly matched primer than with the 
mismatched primer. 

The ASPCR should find application in the fields of genetic 
diagnosis, carrier screening, HLA typing, and any other 
nucleic acid-based diagnostic in which the precise DNA 
sequence of the priming site is diagnostic for the target. In the 
case of HLA typing, recent advances have used PCR ampli- 
fication followed by allele-specific oligonucleotide hybridiza- 
tion for the determination of DR, DQ y and DP alleles (6, 20- 
22). It should be possible to use ASPCR for the direct analysis 
of HLA types. 

We have recently proposed a process for the simultaneous 
determination of multiple polymorphic loci based on the 
concept of producing locus-specific amplification products 
each with a unique length (23). In such a system, since 
ASPCR would produce allele-specific products, the simulta- 
neous analysis of the genotype of the target DNA at multiple 
loci should be possible. 
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tional Science Foundation (R.B.W.), D.Y.W. is a M.D./Ph.D. 
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ABSTRACT DNA diagnostics, the detection of specific 
DNA sequences, will play an increasingly important role in 
medicine as the molecular basts of human disease is defined. 
Here, we demonstrate an automated, nonisotopic strategy for 
DNA diagnostics using amplification of target DNA segments 
by the polymerase chain reaction (PCR) and the discrimination 
of allelic sequence variants by a colorimetric oligonucleotide 
ligation assay (OLA). We have applied the automated 
PCR/OLA procedure to diagnosis of common genetic diseases, 
such as sickle cell anemia and cystic fibrosis (AF508 mutation), 
and to genetic linkage mapping of gene segments in the human 
T-cell receptor 0-chain locus. The automated PCR/OLA 
strategy provides a rapid system for diagnosis of genetic, 
malignant, and infectious diseases as well as a powerful ap- 
proach to genetic linkage mapping of chromosomes and foren- 
sic DNA typing. 



The study of DNA sequence variants in humans is playing an 
important role in diagnosis of genetic and malignant diseases 
(1,2). The analysis of DNA polymorphisms also serves as the 
fundamental tool in attempts to construct genetic linkage 
maps (3, 4) and in forensic analyses (5, 6). Since the majority 
of DNA sequence variants and polymorphisms are single 
nucleotide substitutions (1, 2), diagnostic techniques must 
accurately discriminate single base changes. 

Single base variations in DNA sequences can be detected 
by a variety of techniques including Southern blot analysis (7) 
for restriction fragment length polymorphisms, allele-specific 
oligonucleotide hybridization (8), denaturing gradient gel 
electrophoresis (9), chemical cleavage of mismatched het- 
eroduplexes (10), conformational changes in single strands 
(11), and allele-specific priming of the polymerase chain 
reaction (PCR) (12-14). These techniques have several dis- 
advantages for automating DNA diagnosis, which include the 
use of radioactivity, the requirement for various hybridiza- 
tion conditions, and the need for electrophoresis or centrif- 
ugation. 

The analysis of DNA sequence variants has been greatly 
facilitated by the development of rapid methods to exponen- 
tially amplify specific DNA or RNA targets. Diagnostic 
targets can be amplified by PCR (15-17) or by other available 
methods (18-21). Amplification generates specific targets 
with high signal/noise ratios and permits the use of less 
sensitive nonisotopic reporters in DNA analysis. 

An alternative strategy for DNA diagnosis, the oligonu- 
cleotide ligation assay (OLA), employs two adjacent oligo- 
nucleotides (20-mers), a 5' biotinylated probe (with its 3' end 
at the nucleotide to be assayed) and a 3' reporter probe 
(22-24). The two oligonucleotides are hybridized to target 
DNA and, if there is perfect complementarity, the enzyme 
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DNA ligase covalently joins the 5' biotinylated probe and the 
3' reporter probe. If the probes and target are mismatched at 
their junction, a covalent bond is not formed. Capture of the 
5' biotinylated probe on immobilized streptavidin and anal- 
ysis for covalently linked 3' reporters determine the nature of 
the probe-target interaction (matched or mismatched). The 
ligase assay uses a standard set of conditions to distinguish all 
nucleotide mismatches, and product analysis does not re- 
quire electrophoresis or centrifugation (22). In this report, we 
describe a strategy for automating DNA diagnosis that com- 
bines target amplification by PCR with a nonisotopic analysis 
of DNA sequence variants by OLA. 

MATERIALS AND METHODS 

Robotic Workstation. A Biomek 1000 workstation (Beck- 
man) equipped with muitipipet tools and a multibulk tool was 
used to perform all pipetting, aspirating, and washing pro- 
cedures. The workstation has been modified with a solenoid 
to switch wash solutions during the ELISA. All reagents for 
sample processing were stored in sterile 96-minitube cas- 
settes. 

DNA Samples. DNA from humans with ai-anti trypsin, 
0-globin, and cystic fibrosis variants was obtained from F. 
Heijtmancik (Baylor University), from K. Tanaka (Harbor 
Hospital) and J. Korenberg (Cedar-Sinai Hospital), and from 
A. Osher and E. Hsu (Children's Hospital), respectively, and 
prepared as described (22). DNA for amplification of human 
T-cell receptor /?-chain (TCR)3) gene segments was obtained 
by gently scraping cells from the lining of the buccal cavity 
with a sterile toothpick. Buccal cells were dislodged into a 
minitube containing 10 /xl of sterile H 2 0, covered with 75 fi\ 
of mineral oil, and placed into a 96-minitube cassette for 
handling by the robotic workstation. Cells were lysed with 20 
ix\ of 0.1 M KOH and 0.1% Triton X-100 at 65°C for 20 min 
and neutralized with 20 /il of 0.1 M HC1 and 0.1% Triton 
X-100. 

Oligonucleotides. Amplification primers and ligation probes 
were assembled by using standard phosphoramidite chemistry 
on an Applied Biosystems 380A DNA synthesizer. Ligation 
probes were modified with a 5' biotin group as described (15) 
or chemically phosphorylated with 5' Phosphate-ON (Clon- 
tech) according to the manufacturer's directions. Modified 
probes were purified by reverse-phase high-performance liq- 
uid chromatography. Phosphorylated oligonucleotide probes 
(500 pmol) were labeled with dUTP-digoxigenin by mixing 100 
mM potassium cacodylate, 2 mM CoCl 2 , 200 /jM dithio- 



Abbreviations: PCR, polymerase chain reaction; OLA, oligonucle- 
otide ligation assay; TCR0, T-cell receptor p chain; CFTR, cystic 
fibrosis transmembrane conductance regulator; V, variable; D, di- 
versity; J, joining; C, constant; STS, sequence-tagged site. 
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threitol, 2.5 fi\ of dUTP-digoxigenin (Boehringer Mannheim), 
and 2 /il of adenosine triphosphate (40 yM) with 70 units of 
terminal deoxynucleotidyltransferase (Collaborative Re- 
search) for 1 hr at 37°C. Free dUTP-digoxigenin was removed 
by two successive ethanol precipitations. 

DNA Amplification. The robotic workstation was pro- 
grammed to assemble PCR reagents [5 jxi containing 20 mM 
Tris-HCl (pH 8.3), 100 mM KC1, 3 mM MgCl 2 > 20 ng of bovine 
serum albumin per ml, the four deoxynucleotide triphos- 
phates each at 400 yU, 0.5 ptM amplification primers, 0.1% 
Triton X-100, and 0.05 unit of Thermus aquatkus DNA 
polymerase per well], genomic DNA (5 /xl at 2 ng//xl in sterile 
distilled H 2 0 containing 0.1% Triton X-100), and 70 jxl of light 
mineral oil in a flexible U-bottomed 96-well microtiter plate 
(Falcon). Genomic DNA samples were denatured at 93°C for 
4 min and amplified by 40 cycles of 93°C for 30 sec, 55°C 
[cystic fibrosis transmembrane conductance regulator 
(CFTR) and TCRa constant (C a ) gene segments] or 61°C 
(0-globin and ai-antitrypsin gene segments) for 45 sec, and 
72°C for 90 sec in a microtiter plate thermal cycler (MJ 
Research, Watertown, MA). For amplification of TCR)3gene 
segments, 15 /xl of PCR reagents (as described above) con- 
taining all six amplification primers, 15 itl of the lysed buccal 
samples, and 70 yX of mineral oil were added to a flexible 
microtiter plate. Targets were denatured at 93°C for 4 min and 
amplified by 20 cycles of 30 sec at 93°C, 45 sec at 61°C, and 
90 sec at 72°C. Five microliters from these reaction mixtures 
was used to initiate a second round of amplification for each 
of the individual TCR/3 gene segments (40 cycles; 30 sec at 
93°C, 45 sec at 61°C, and 90 sec at 72°C). 

Ligation Assays. Ligation reaction mixtures were assem- 
bled by the robotic workstation. Forty-five microliters of 0.25 
M NaOH containing 0.1% Triton X-100 was added to ampli- 
fied DNA samples. Ligation probes (200 fmol each) in 10 /xl 
of 2x ligase buffer (100 mM Tris-HCl, pH 7.5/20 mM 
MgCl 2 /2 mM spermidine/2 mM adenosine triphosphate/10 
mM dithiothreitol) and 50% formamide were added to a 
U-bottomed 96-well microtiter plate. DNA samples were 
neutralized with 45 yX of 0.25 M HC1 and six 10-xil aliquots 
were added to the microtiter plate containing the ligation 
probes. Samples were covered with 70 yX of mineral oil, 
denatured at 93°C for 2 min, cooled, and returned to the 
workstation for the addition of 5 /xl of T4 DNA ligase (5 
units/ml) (Amersham) in lx ligase buffer. Ligations were 
done at room temperature (RT) for 15 min. Reactions were 
stopped by adding 10 yX of 0.25 M NaOH per well and, after 
2 min at RT, 4 yX of 3 M sodium acetate (pH 6.5) per well. 
Samples were transferred to a 96-well flat-bottomed micro- 
titer plate (Falcon) coated with streptavidin [60 /xl of strepta- 
vidin (100 /xg/ml) or avidin (100 /xg/ml) (Vector Laboratories) 
for 1 hr at 37°C] and blocked 20 min (RT) before use with 200 
/xl of 100 mM Tris-HCl, pH 7.5/150 mM NaCl/0.05% Tween 
20 (buffer A) per well with 0. 5% dry milk and 100 /xg of salmon 
sperm DNA per ml. Biotinylated probes were captured at RT 
for 30 min, and the plate was washed twice with 0.01 M 
NaOH and 0.05% Tween 20 and once with buffer A. Thirty 
microliters of anti-digoxigenin antibodies (diluted 1:1000; 
Boehringer Mannheim) in buffer A with 0.5% dry milk was 
added to each microtiter well. Plates were incubated 30 min 
(RT) and washed six times with buffer A. Substrate (30 /xl of 
BRL ELISA amplification system per well) was added, the 
plates were incubated 15 min (RT), and 30 /xl of amplifier was 
added. Spectrophotometric absorbances were taken at 490 
nm by a Bio-Tek (Burlington, VT) plate reader and absor- 
bances were directly entered into an IBM-XT computer. 

Linkage Analysis. Observed haplotype frequencies were 
calculated for genetic linkage analysis of TCR/3 gene seg- 
ments with a myriad haplotype program (25). The probability 
of linkage disequilibrium was calculated based on the x 2 
distribution of the Q statistic described by Hedrick et a!. (26). 



RESULTS 

The Automated PCR/OLA Strategy. Our strategy for au- 
tomated gene analysis is shown in Fig. 1. A Biomek 1000 
robotic workstation was used to (0 prepare targets and 
assemble reagents for DNA amplification, (//) mix and ligate 
5' biotinylated probes and 3' digoxigenin-labeled reporter 
probes on amplified DNA targets using T4 DNA ligase, (w) 
capture 5' biotinylated probes on streptavidin-coated micro- 
titer plates, (/V) wash plates, and (v) detect the digoxigenin 
reporter coupled to biotin-labeled probes by an ELISA. 
Altogether, processing time for 96 samples from entry to 
computer read-out takes <7 hr. Overnight amplification 
permits processing of ligation assays from 192 DNA samples 
in a single day (1200 reactions, triplicates for two alleles). 

Amplification Primers and Ligation Probes. A panel of 
amplification primers and ligation probes for known se- 
quence variants in human DNA have been synthesized (Table 
1). Two sets of probes detect mutations that cause common 
genetic diseases in homozygous individuals, sickle cell ane- 
mia and CF (27, 28). Another set detects a common mutation 
in the a x -antitrypsin gene that, in homozygous individuals, 
leads to a predisposition for cirrhosis of the liver in childhood 
and emphysema in adults (29). The remaining probes detect 
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2 Denature, Anneal and Ligate Modified Oligonucleotides 
on Amplified Target 



3 Capture Biotinylated Oligonucleotides and 

Perform ELISA for Digoxigenin 




Fig. 1. Schematic diagram of the steps in the automated PCR/ 
OLA procedure performed with a robotic workstation. The assay 
contains three steps: 1, DNA target amplification; 2, analysis of 
target nucleotide sequences with biotin (B)-Iabeled and digoxigenin 
(D>labeled oligonucleotide probes and T4 DNA ligase (L); 3 , capture 
of the biotin (B)-labeled probes on streptavidin (SA)-coated micro- 
titer wells and analysis for covalently linked digoxigenin (D) by using 
an ELISA procedure with alkaline phosphatase (AP)-conjugated 
anti-digoxigenin (aD) antibodies and a substrate (S). 
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Table 1. Nucleotide sequence of the amplification primers and ligation probes used in automated ONA analysis 



Genomic 

region 
amplified 



Ligation probes 



Amplification primers 



Biotin-labeled probe 



Reporter-labeled probe 



Target detected 
by 

ligation probes 



0-Globin 



CFTR 
Ca 

V^6.71 
V^6.72 
V fl l 



C AACTTCATCC ACGTTC ACCTTGCC 1 . 
AGGGCAGGAGCCAGGGCTGGG 2. 
a v Antitrypsin TC AG CCTT AC A ACGTGTCTCTGCTT 1. 

GT ATGGCCTCT A A AA AC ATGGCCCC 2. 
C AGTGG A AG A ATGGC ATTCTGTT 1. 
GGCATGCTTTGATGACGCTTCTG 2. 
CCTTGAAGCTGGGAGTGG 1. 
G A GCT A AG AG AG CCGT ACTGG 2. 
A AGGG AAAGG ATGTAG AG 1 . 

CTGGCACAGAGATACACGGCC 2. 
A AGGG A A AGG ATGTAG AG 1. 
CTGGCACAGAGATACACGGCC 2. 
GAGTCACACAAACCCCAAAGCACCT 1. 
GCTGCTGGCACAGAAATACAAAGCT 2. 
CATTATGGTCCTTTCCCGG 1. 
AG CTCC AC GTGGTCGGGGT 2. 



B- ATGGTGC ACCTG ACTCCTG A 

B-ATGGTGCACCTGACTCCTGT 

B-GGCTGTGCTGACCATCGACG 

B-GGCTGTGCTGACCATCGACA 

B-ATTAAAGAAAATATCATCTT 

B-ACCATTAAAGAAAATATCAT 

B-GAAACGAAGAAACTGAGGCCA 

B-GAAACGAAGAAACTGAGGCCC 

B-TTT ACTGGT ACCG AC AG AGC 

B-TTTACTGGTACCGACAGAGG 

B-TCTGCAGAGAGGACTGGGGG 

B-TCTGCAGAGAGGACTGGGGA 

B-AGGCCTCCAGTTCCTCATTCAG 

B-AGGCCTCCAGTTCCTCATTCAC 

B-ACCAGGACCAGACAGCTCTC 

B-ACCAGGACCAGACAGCTCTT 



pGG AG A AGTCTG CCGTT A CTG-D 

pAGAAAGGGACTGAAGCTGCT-D 

pTGGTGTTTCCT ATG ATG A AT- D 

pCACAGCTAATGAGTGAGGAAGA-D 

pCTGGGGCAGGGCCTGGAGTT-D 

pATCCGTCTCCACTCTGACGA-D 

pT ATT AT A ATGG AG A AG AG AG AGC A-D 

pAGAGCAACCCTAGCCCCATTAC-D 



1.0A 
2. $S 

1. M 

2. Z 

1. Non-F508 

2. AF508 

1. C a 3A 

2. C a 3B 

1. V^.71A 

2. V^6.71B 

1. V^6.72A 

2. V^6.72B 

1. V^IA 

2. V^B 

1. C^3A 

2. C^B 



Ligation reactions were performed with a mixture of a biotin-labeled and reporter-labeled probe for each specific allele. 



polymorphisms in the human TCR/3 and TCRa loci (refs. 30 
and 31; C. Whitehurst, P. Charmley, L.H., and D.A.N. , 
unpublished data). Most of these probes detect single nucle- 
otide substitutions in a specific DNA target. However, one 
set of probes detects a 3-base-pair (bp) deletion in the gene 
encoding CFTR (28) and represents a model for the detection 
of sequence deletions by OLA. 

Analysis of DNA Sequence Variants. As a model for DNA 
diagnosis by the PCR/OLA procedure, we obtained genomic 
DNAs from 32 individuals of known genotype. The robotic 
workstation was used to assemble PCR reagents and genomic 
DNA samples in a 96-well microtiter plate. After amplifica- 
tion, ligations were performed in triplicate for each allele, and 
the immobilized probes were analyzed for the presence of 
digoxigenin. An example of a microtiter plate obtained from 
this process is shown in Fig. 2. Amplified targets from 
homozygous and heterozygous individuals for the indicated 
nucleotide substitutions (0-globin, a r antitrypsin, and TCR 
C a ) or deletion (CFTR) were used. The assay clearly iden- 
tifies which alleles 1 and/or 2 (Table 1) were present in each 
of the amplified samples (Fig. 2). Fig. 3 shows the mean 
absorbances obtained from ligation assays on amplified DNA 
targets from eight different individuals for each of the ana- 
lyzed gene segments (32 individuals altogether). Mean ab- 
sorbances from different individuals ranged from 0.38 to 1.17. 
We have found that mean absorbances from the ligation 
assays reflect the amount of target present in an amplified 
DNA sample. In this regard, the colorimetric assay is quite 
sensitive and can detect 3 fmol of ligated product (data not 
shown). The high signal/noise ratios (10:1-200:1) obtained 
with this procedure also permit simple data processing to 
define the genotype of an amplified DNA sample by calcu- 



lating the ratio of the mean absorbance for each allele in the 
ligation assay. Furthermore, since the outcome of the PCR/ 
OLA procedure is based on the mean absorbance of triplicate 
ligation reactions, the chance of error arising from spurious 
false-negative or false-positive wells is also minimized (false- 
negative or false-positive wells < 0.2% in 4000 reactions; data 
not shown). 

Genetic Linkage Analysis of TCR0 Genes. The automated 
PCR/OLA protocol has been extended to include the prep- 
aration of DNA samples by the robotic workstation. Ampli- 
fied DNA targets from human buccal samples were used to 
determine the frequency and genetic linkage of four DNA 
sequence polymorphisms in the human TCR0 locus as shown 
in Fig. 4. The human TCR0 locus is composed of several gene 
segments, variable (V), diversity (D), end joining (J), and 
constant (C) genes, which span >600 kilobases (kb) of DNA 
(Fig. 4) (32, 33). Using data obtained from the automated 
PCR/OLA procedure on these 96 samples, we found that two 
V^6.7 polymorphisms were in complete linkage disequilib- 
rium (P < 10~ 14 ). This finding was not surprising since these 
variants are separated by a small physical distance (100 bp). 
Although the exact location of the V^6.7 gene segment in the 
TCR/3 locus is not known, analysis of available cosmid and 
YAC clones by gene-specific PCR suggests that V^6.7 is 
probably located 5' to the V p l gene segment. The three TCR 
polymorphisms (V^6.7, V^l, and C^), physically spanning at 
least 600 kb* appeared to be in linkage equilibrium with one 
another. Indeed, the expected haplotype frequencies calcu- 
lated assuming linkage equilibrium were very close to those 
observed (P < 0.81) (Table 2). These findings confirm those 
recently reported in a study of TCR polymorphisms detected 
as restriction fragment length polymorphisms and may sug- 
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Fig. 3. Mean spectrophotometric absorbances (+1 SD) from 
triplicate ligation reactions performed by the automated PCR/OLA 
procedure on amplified DNA samples obtained from eight donors for 
each gene analyzed (32 DNA samples total). 

gest that hot spots of recombination exist in the TCR/3 locus 
(34). 



DISCUSSION 

Automated analysis of £>N A polymorphisms and variants by 
PGR/OLA has many advantages over existing approaches to 
DNA diagnostics. Small numbers of cells (cheek scraping) or 
DNA samples. (10 ng) are sufficient for analysis. Only small 
fragments of DNA (a few hundred base pairs) are required. 
Therefore, partially degraded DNA is still useful. The re- 
agents are stable and easily obtained, and nonisotopic re- 
porter groups are used. The entire assay is performed in 
microtiter wells, thus avoiding the use of centrifugation or 
electrophoresis. The assay yields high signal/noise ratios and 
a simple readout that is easily transferred to a computer for 
storajge and analysis; no measurements of DNA fragment 
sizes are necessary. All of the tested sequence variants 
(nucleotide transitions arid transversions, and a deletion) 
could be discriminated, by OLA using a standard set of 
conditions. The initial PGR amplification facilitates the dis- 
crimination of polymorphisms in individual members of a 
multigene family (e.g. , the TCR V^6.7 gene segment is one of 
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Expected haplotypes were calculated assuming random allelic 
association— e.g., AAA = 0.66 x 0.83 x 0.66 x 192 = 69. 

nine highly similar members of the V^6 subfamily). The two 
successive levels of sequence discrimination, PCR and then 
OLA, enhance signal/noise ratios and reduce the likelihood 
of error, particularly in the analysis of polymorphisms in 
multigene families. The steps in the assay are automatable, 
eliminating the need for human intervention (and possible 
mistakes) in a tedious and repetitious process. With automa- 
tion, high throughput is possible. At present, we can process 
1200 ligation reactions per day with a single operator and 
robotic workstation, and, in the near future, further automa- 
tion with a. robotic arm will permit processing of 6000 
reactions per day. 

The automated PCR/OLA assay can be applied in many 
different basic research and clinical areas. Genetic diseases 
fall into several different categories including the common 
and widespread mutations of sickle cell disease, a r 
antitrypsin or CF, and newly arising spontaneous mutations 
such as Lesch-Nyhan disease (35). Clearly, PCR/OLA fa- 
cilitates the analysis of the common mutations, either in 
screening at-risk members of families with diseases or for 
more general carrier screening purposes. Rapid techniques 
are being developed to identify the sequence variations of 
newly arising mutations (35, 36). Once identified, the com- 
bined PCR/OLA procedure can be used to follow the inher- 
itance of these specific mutations in affected families. Many 
genes cause a predisposition toward disease. This is true of 
the ai-antitrypsin mutation described above. Recently, it has 
been demonstrated that certain TCR and HLA haplotypes 
may predispose humans to certain autoimmune diseases such 
as multiple sclerosis (37-39). Therapeutic strategies are being 
developed to circumvent these predispositions (40-42). 
Therefore, automated screening may be Useful in the near 
future to identify the genes associated with disease predis- 

100 kb 



Centromere Vf*~/ /~*~f 




E5 SSt 



6 12 aie.2 6.3 

■ ■ ■ ■ ■ 



"]["] [ ■ ■ ] ' 




Fig. 4. Schematic diagram of the human TCR/3 locus giving the relative order of the V, D, J, and C gene segments. DNA polymorphisms 
in three indicated gene segments were analyzed in 96 individuals. Their location, where known, is shown (arrow up). The nucleotide substitutions 
analyzed and the frequency for each variant in these samples are shown. 



Genetics: Nickerson et ai 

positions in which some form of preventive therapy can be 
initiated. 

The automated PCR/OLA procedure provides a powerful 
approach to high-resolution genetic linkage mapping of the 
human genome or other complex genomes. For this ap- 
proach, sequence-tagged sites (STSs) (43) from specific chro- 
mosomal regions (e.g., the TCR/J locus) or from a specific 
chromosome (e.g., STSs obtained from random clones of a 
flow-sorted chromosome library) wou|d be scanned for in- 
ternal DNA sequence polymorphisms (9-11) to obtain a set 
of polymorphic STSs. Once acquired, polymorphic STSs can 
be rapidly ordered by analysis of large multigeneratipn fam- 
ilies or by single-sperm typing (44, 45) using the automated 
PCR/OLA system. 

The availability of human polymorphic STSs will also 
provide a set of markers for automated forensic typing. For 
example, with a set of maximally informative biallelic mark- 
ers (50:50 distribution in random mating populations) from 
each of the 22 human autosomes, the probability that two 
individuals would have identical DNA fingerprints — i.e., the 
same set of the 44 alleles — is ~1 in 10 10 . The automated 
PCR/OLA procedure eliminates most of the limitations as- 
sociated with forensic typing by conventional Southern blot 
analysis (e.g., the measurement of DNA fragment sizes, the 
requirement for high quality DNA, and the use of radioiso- 
topes). 

Other applications for automated DNA diagnosis by the 
PCR/OLA procedure include HLA typing, the analysis of 
recessive or dominant oncogenes, and the identification of 
infectious pathogens. The use of commercially available 
thermostable ligases and automated ligation amplification 
reactions in the direct detection of single copy genes can also 
be explored. Moreover, multiple nonisotopic reporter groups 
may be developed that will be simultaneously analyzed in a 
single microtiter well. This raises the possibility of multi- 
plexing the OLA procedure to the point where initially both 
alleles can be analyzed together and eventually multiple 
biallelic loci can be typed in a single well. These and other 
improvements, such as a single instrument to perform the 
entire analysis, will greatly increase the throughput and 
potential applications of automated DNA diagnostics. 
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generated a substrate that was extended by 
the polymerase to a complete 50-bp duplex 
molecule (Fig. 4). This confirms the result 
shown in Fig. 2B that Radl-RadlO removes 
the 3' single-stranded tail, and indicates 
that Radl-RadlO cleavage products contain 
3' -OH groups, the required substrate for 
extension by DNA polymerase. Hence, 
Radl-RadlO endonuclease products are suit- 
able substrates for a necessary subsequent step 
in both the SSA recombination and NER 
models. 
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Xhe application of synthetic oligonucle- 
otides in combination with nucleic acid- 
specific enzymes has brought simplicity 
and convenience to molecular genetic 
analyses. There is, however, a need for 
methods m which oligonucleotides can be 
used for localized detection of single-copy 
gene sequences and for distinction among 
sequence variants in microscopic specimens. 
Such methods would help to bridge the 
analytic gap between specific gene se- 
quences and subcellular structures. We have 
developed oligonucleotide probe molecules 
that should be useful for localized detection 
of specific nucleic acids. These "padlock" 
probes are composed of two target-comple- 
mentary segments, connected by a linker 
that may carry detectable functions. The 
two ends of the linear oligonucleotide 
probes are brought in juxtaposition by hy- 
bridization to a target sequence. This jux- 
taposition allows the two probe segments to 
be covalently joined by the action of a 
DNA ligase. Because of the helical nature 
of DNA, circularized probes are wound 
around the target strand, topologically con- 
necting probes to target molecules through 
catenation, in a manner similar to padlocks. 
The requirement for simultaneous hybrid- 
ization of two different probe segments to 
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target molecules provides for high specific- 
ity of detection in complex populations of 
nucleic acids ( J ). Moreover, the act of liga- 
tion permits facile distinction among simi- 
lar target sequence variants as terminally 
mismatched probes are poor substrates for 
ligases (1, 2). Finally, the covalent catena- 
tion of probe molecules to target sequences 
described here results in the formation of a 
hybrid that resists extreme washing condi- 
tions, serving to reduce nonspecific signals 
in genetic assays. 

Probes useful for circularization experi- 
ments were constructed by solid phase syn- 
thesis of oligonucleotides that contained 
two hybridizing regions of 20 nucleotides 
each, connected by a 50-nucleotide-long 
linker segment (Fig. 1). Phosphate groups 
were added at the 5' ends of the molecules 
as required for enzymatic ligation. Alterna- 
tively, residues of hexaethylene glycol 
(HEG) were incorporated in the linker seg- 
ment during standard solid phase synthesis 
(3). The HEG residues served to reduce the 
number of synthetic steps required to span 
the ends of the two target-complementary 
segments. 

Cyclizable probes were designed to de- 
tect a 40-nucleotide target sequence, rep- 
resented either by an oligonucleotide 
molecule or by the polylinker sequence of 
the single-stranded form of the circular 
cloning vector Ml 3 mpl8. Ligation prod- 
ucts could be separated by denaturing 
polyacrylamide gel electrophoresis (Fig. 
2A). In the presence of the oligonucleo- 
tide target, linear probes were efficiently 
converted to circular molecules with a 
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Nucleotide sequence information derived from DNA segments of the human and other 
genomes is accumulating rapidly. However, it frequently proves difficult to use such 
short DNA segments to identify clones in genomic libraries or fragments in blots of 
the whole genome or for in situ analysis of chromosomes. Oligonucleotide probes, 
consisting of two target-complementary segments, connected by a linker sequence, 
were designed. Upon recognition of the specific nucleic acid molecule the ends of the 
probes were joined through the action of a ligase, creating circular DNA molecules 
catenated to the target sequence. These probes thus provide highly specific detection 
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distinct rate of migration. Probes interact- 
ing with M13 target molecules were con- 
verted to a species catenated to and there- 
fore migrating with the large M13 mole- 
cule during denaturing gel electrophore- 
sis. As the probes were labeled by the 



addition of a radioactive phosphate group 
at the 5' terminus, only ligated molecules 
retained their label after treatment with 
alkaline phosphatase. Circular oligonu- 
cleotides are insensitive to digestion with 
exonuclease VII, which attacks at free 5' 



or 3' ends of DNA strands (4). Depending 
on how the probes are labeled, phospha- 
tases or exonucleases could be used to 
remove any signal arising from unreacted 
probes in various assays, thus reducing 
background (5). 

We also investigated the consequences 
of cyclically repeating the probe hybridiza- 
tion and ligation reaction. The amount of 
cyclized probe molecules increased linearly 
with the number of ligation cycles when a 
short oligonucleotide target was used (Fig. 
2B). By contrast, under the same conditions 
the maximal number of probes were bound 
to the closed, circular M13 target molecule 
in a single ligation cycle; thereafter the 
signal decreased, probably because of scis- 
sion of the single-stranded target molecule 
during heat denaturation. Thus, a single 
probe may be catenated to each circular 
target molecule. This indicates that circu- 
larized probe molecules, constrained to one- 
dimensional diffusion along the target 
strand during heat denaturation, rapidly oc- 
cupy the correct target sequence before 
new probes bind to this sequence when 
the temperature is lowered. Repeated cy- 
cles of ligation can, however, increase the 
probability that any target sequence will 
be detected by probe molecules specific 
for that target, particularly when allele- 
specific probes are used to distinguish 
among sequence variants. 

Investigators can use oligonucleotide 
probe ligation reactions to distinguish 
among related DNA sequences by study- 
ing their ability to serve as templates for 
ligation of oligonucleotides complemen- 
tary to one or the other sequence variant 
(I). Whereas probes specific for one of 
the two sequence variants may hybridize 
stably to either of the two sequences, only 
target molecules correctly base-paired to 
the juxtaposed ends of the probes can 
assist in the ligation. We investigated the 
capacity of the padlock probes to distin- 
guish between a normal and a mutant 
DNA sequence in plasmid clones immo- 
bilized on nylon membranes (Fig. 3). Plas- 
mids containing the AF508 variant of the 
cystic fibrosis transmembrane conduc- 
tance regulator (CFTR) gene or the cor- 
responding normal gene segment were 
spotted on nylon membranes and subject- 
ed to probe hybridization and ligation. 
The mutation removes 3 base pairs (bp) 
(6) corresponding to the 3' end of the 
circularizable probe. Probe molecules spe- 
cific for the normal sequence gave rise to 
a signal only when reacted with the nor- 
mal sequence but not with the AF508 
variant of the CFTR gene when probe 
ligation was followed by denaturing wash- 
es in 0.2 M NaOH for 5 min. This strin- 
gent wash (to interrupt hybridization be- 
tween DNA molecules) permitted effi- 



Fig. 1. Structure of a 
padlock probe interact- 
ing with its target se- 
quence. (A) Molecular 
modelof the probe-target 
complex. The molecular 
model was prepared on a 
Silicon Graphics work- 
station, with Insight II 
(Biosym Technologies). 
(B) Sequence composi- 
tion of a probe, specific 
for a segment present in 
the M13 cloning vector 
sequence. At the 5' end 
of the probe, beginning 
with a phosphate group, 
20 target-complementa- 
ry nucleotide positions 
are shown in red. Directly 
contiguous with these is 
a linker segment of 50 T residues, shown in green. Finally, the 20 nucleotides at the 3' end of the 
probe are yellow. The target sequence is shown in blue. 




cip 



Fig. 2. Analysis by gel electrophoresis of the tar- 
get-dependent circularization of an oligonucleo- 
tide probe. (A) A 90-bp oligonucleotide probe (5' 
TGCCTGCAGGTCGACTCTAG( -Oso-CGGCCA- 
GTGCCAAGCTTGCA-3' , see also Fig. 1B) was 
designed such that its 5' and 3' ends would hy- 
bridize adjacent to each other to a segment in the 
polylinker region of the M1 3 mp1 8 cloning vector. 
The probe was gel-purified and 5'-phosphorylat- 
ed by T4 polynucleotide kinase (New England Bio- 
labs) and y^P-ATP (3000 Ci/mmol, Dupont). To 
ensure that most or all 5' ends were phosphoryl- 
ated, a second kinase incubation was performed 
in the presence of a 20-fold excess of adenosine 
triphosphate (ATP). The labeled probe (6 pmol) 
was incubated with 3 pmol of either of two differ- 
ent templates: the 7.2-kb, single-stranded, circu- 
lar M1 3 mp18 molecule or an oligonucleotide 
(5'-TTTTTCTAGAGTCGACCTGCAGGCATG- 
CAAGCTTGGCACTGGCCGTTTTT-3') that con- 
tained the same 40-bp target sequence, in 1 00 \i\ 
of 20 mM tris-HCI (pH 8.3), 25 mM KCI, 10 mM 
MgQ 2 , 1 mM NAD + , 0.01% Triton X-100, and 
200 U of Ampligase (Epicentre Technologies). The 
reactions were heated to 90°C (1 min), then 
cooled to 55°C (5 min) and chilled on ice. Samples 
(10 pJ) were taken from the ligation reactions and 
treated with either 0.5 U of calf intestinal alkaline 
phosphatase (CIP; New England Biolabs) or 0. 1 U 
of exonuclease VII (Exo VII; Gibco/BRL). (B) The 
same probe (9 pmol) was subjected to repeated 
cycles of ligation, separated by heat denaturation 
steps, in the presence of 0.3-pmol oligonucleotide 
target (open circles) or the circular single-stranded 
target molecule (filled circles). Radioactive ligation 
products, accumulated after the indicated num- 
ber of cycles, were separated by gel electrophore- 
sis on a 6% denaturing polyacryiamide gel and 
quantitated with a Phosphorimager (Molecular Dynamics). 
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cient distinction between the allelic vari- 
ants, as only cyclized probes remain 
bound to the membrane. By contrast, a 
stringent but nondenaturing wash of the 
same probes in a solution of 2% SDS in 
0.1 X standard saline citrate (SSC) gave 
poor distinction between the two target 
sequences. Because signal strength is pre- 
served under conditions that prevent hy- 
bridization between complementary DNA 
strands, nonspecifically trapped probe 
molecules may be efficiently removed, re- 
sulting in a reduction of the level of 
background in gene detection reactions. 

As indicated in Fig. 2B, circularized 
probe molecules are free to travel consider- 
able distances along the target strands dur- 
ing denaturing washes. To measure the dis- 
tance traveled, probe-cyclization reactions 
were carried out on equivalent numbers of 
covalently closed target molecules or mole- 
cules that had been linearized at variable 
distances from the probe-complementary 
sequence before being immobilized on ny- 
lon membranes (Fig. 3BJ. Few probe mole- 
cules that were cyclized around target 
strands interrupted approximately 150 nu- 
cleotides from the probe-complementary se- 
quence remained after denaturing washes. 
By contrast, strands digested 850 nucleo- 
tides from the probe-complement retained 
similar numbers of probes as did uninter- 
rupted strands. The greater preservation of 
signal upon denaturing washes of probes 
bound to the longer linear target molecules 
probably reflects the increased likelihood 
that target molecules were cross-linked to the 
membrane on both sides of the site where the 
probe was catenated. This trapping of circu- 
larized probes by catenation to linear target 
molecules, in combination with the specific 
detection afforded by the requirement that 
two different probe segments simultaneously 
react with the target sequence, should be of 
value in procedures such as DNA blotting or 
for screening genomic libraries with short 
probe sequences. 

Currently, oligonucleotide probes find 
limited applications for in situ analysis of 
gene sequences in metaphase chromo- 
somes. This is a consequence of problems 
both with specificity of detection and 
sensitivity of visualization. A circulariz- 
able probe, specific for a repeated centro- 
meric motif characteristic of- human chro- 
mosome 12 (7), was used for in situ hy- 
bridization followed by ligation in human 
metaphase chromosome preparations. A 
wide range of washing conditions, includ- 
ing ones that remove specifically hybrid- 
izing oligonucleotide or longer probes pre- 
served signals from in situ circularized 
probe molecules and permitted efficient 
distinction from alphoid repeat sequences 
present on other human chromosomes 
(Fig. 4). Given sufficiently sensitive tech- 



niques for detection of probe molecules, 
the high specificity of padlock probes in 
conjunction with the reduced nonspecific 
background observed should permit de- 
tection of short, single -copy DNA se- 
quences in human chromosomes. In- 
creased signal could be obtained by sec- 



ondary ligation of detectable molecules to 
the linker segment of bound probes. Thus, 
oligonucleotide probes could be used to 
screen for the presence of known muta- 
tions in loci distributed along the chro- 
mosomes, by means of color-coded probes 
specific for normal and mutant sequence 
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Fig. 3. Distinction of target DNA molecules immobilized on nylon membranes by means of a circularizable 
probe. (A) Fifteen femtomoles of two plasmids containing the normal or a 3-bp deleted variant of the CFTR 
gene were spotted on nylon membranes (PALL). The filters were treated with 0. 1 % SDS in boiling water and 
left for 1 0 min at room temperature; filters were then washed twice with phosphate-buffered saline (PBS) {9) 
to remove plasmids that had not been fixed to the membrane. Thirty femtomoles of a circularizable probe (5 '-P 
TGGTGTTTCCTATGA((HEG)2C-B)4(HEG) 2 AAGAAATATCATCTT-3') per microliter was hybridized to the 
membranes for 30 min in 5 x SSPE (9); 5 x Denhardt's solution (9), and salmon sperm DNA (500 |xg/ml). The 
probe contained NH 2 -modified C residues to which biotin had been coupled by means of a biotin-NHS ester 
(Ctonetech Laboratories) as described (70). Next, the membranes were incubated for 1 hour at room 
temperature in a solution of 1 0 mM tris, pH 7.5, 1 0 mM Mg(Ac) 2 , 50 mM KAc, 0.2 M NaCI, 1 mM ATP, and 
0.1 5 U of T4 DNA ligase per microliter (Pharmacia). The membranes were washed in a solution of 2% SDS in 
1 x SSPE fa 30 min, next in either 2% SDS in 0.1 x SSC for 30 min for a stringent wash, or, for a denaturing 
wash, in 0.2 M NaOH for 5 min, and then in 1 x SSPE, 2% SDS, for 30 min. A signal was generated by 
incubating the membranes for 5 min in streptavidin- horseradish peroxidase conjugate (0.05 p^/ml; Boeh- 
ringer Mannheim) in 2 x SSPE, 2% SDS, rinsing in PBS for 30 to 60 min, and then soaking in ECL solution 
(Amersham) for 1 min. The chemoluminescent signal was recorded on X-omat-S film. <B) Plasmids containing 
the normal (N) or mutant (M) variants of the target molecules were digested with restriction enzymes at the 
indicated distances from the sequence complementary to the probe or were left undigested. After immobili- 
zation on nylon membranes, the plasmids were probed by hybridization with a circularizable oligonucleotide, 
followed by a ligation step and a denaturing wash in 0.2 M NaOH. 

Fig. 4. Detection of a chromosome 12-specific 
repeated sequence in human metaphase chro- 
mosomes, by in situ hybridization and ligation of a 
biotinylated circularizable probe. Metaphase 
chromosome preparations were obtained from a 
human lymphocyte culture by standard tech- 
niques of colcemide treatment, hypotonic shock, 
and fixation in methanol + acetic acid. In situ 
hybridization and ligation were performed by a 
modification of the procedure described {11). The 
slides were treated with ribonuclease A at 200 
^g/mi in 2 x SSC (9) for 1 hour at 37°C, dehy- 
drated in a series of 70, 90, 95, and 99% ice-cold 
ethanol washes for 2 min each, and air-dried. The 
chromosome preparations were then denatured 
in 70% formamide, 2 x SSC at 70°C for 2 min; 
immediately dehydrated in a series of 70, 90, 95, 
and 99% ice-cold ethanol washes for 2 min each; 

and air-dried. Circularizable probe (10 fmot/^i) specific for an alphoid repeat-motif present on chromo- 
some 12 (5'-P AAATCTCCAACTGGAAACTG ((HEG) 2 (C-B)) 7 (HEG) 2 ATTTGGTCTCAAAGTGATTG-3') 
was hybridized for 18 hours at 37°C 2 x SSC, 20% formamide and salmon sperm DNA (1 jig/fxl) in a 
25-(xl volume on each slide. A 5-min wash in2 x SSCat37°C and a brief wash in lOmMtris, pH7.5, 10 
mM Mg(Ac) 2 , 50 mM KAc, 10 mM ATP preceded ligation in the same buffer, containing T4 DNA ligase 
(0.085 U/pJ) for 1 hour at 37°C. The slides were washed twice in 2 x SSC with 20% formamide at 37°C 
for 5 min each, followed by two washes in 2 x SSC and once in PN buffer (0. 1 M NaH 2 P0 4 0. 1 % NP-40, 
adjusted to pH 8.0 with 0.1 M Na 2 HP0 4 ) at 37°C, 5 min each. Bound probes were visualized by means 
of fiuorescein-labeled avidin, followed by a layer of biotinylated antibodies against avidin, both at 5 ng/ml 
(Vector Laboratories), and a second layer of fluoresceinated avidin. All incubations were performed in PN 
buffer containing 5% nonfat milk at 37°C for 20 min followed by three washes in PN buffer at room 
temperature for 5 min each. The metaphase chromosomes were stained with propidium iodide and 
photographed with a Nikon Axiofot microscope. 
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Localization of a Breast Cancer Susc ptibility 
G n , BRCA2, to Chromosom 13q12-13 

Richard Wooster * Susan L. Neuhausen,* Jonathan Mangion,* 
Yvette Quirk,* Deborah Ford,* Nadine Collins, Kim Nguyen, 
Sheila Seal, Thao Tran, Diane Averill, Patty Fields, Gill Marshall, 
Steven Narod, Gilbert M. Lenoir, Henry Lynch, Jean Feunteun, 
Peter Devilee, Cees J. Cornelisse, Fred H. Menko, Peter A. Daly, 
Wilma Ormiston, Ross McManus, Carole Pye, Cathryn M. Lewis, 
Lisa A. Cannon-Albright, Julian Peto, Bruce A. J. Ponder, 
Mark H. Skolnick, Douglas F. Easton,t David E. Goldgar, 
Michael R. Stratton 

A small proportion of breast cancer, in particular those cases arising at a young ag , is 
due to the inheritance of dominant susceptibility genes conferring a high risk f the 
disease. A genomic linkage search was performed with 1 5 high-risk breast cancer families 
that were unlinked to the BRCA1 locus on chromosome 17q21 . This analysis localized a 
second breast cancer susceptibility locus, BRCA2, to a 6-centimorgan interval on chro- 
mosome 13q12-13. Preliminary evidence suggests that BRCA2 confers a high risk of 
breast cancer but, unlike BRCA1, does not confer a substantially elevated risk of ovarian 
cancer. 



variants. Furthermore, probe cyclization 
reactions depend on an intramolecular 
reaction as opposed to reaction between 
pairs of independent probe molecules as 
in amplification by the polymerase chain 
reaction. Thus, there should be fewer prob- 
lems with nonspecific reactions resulting 
from interactions between noncognate pairs 
of probe segments with cyclizable probes. 
The present probe design should permit the 
simultaneous analysis of multiple gene se- 
quences in a DNA sample. 

In conclusion, the nucleic acid probe pre- 
sented here permits highly specific detection 
of nucleotide sequences and, although the 
target is not amplified, highly sensitive detec- 
tion is possible through efficient reduction of 
nonspecific signal. Circularizable probes 
should be applicable in a number of other 
contexts, including the detection of specific 
RNA molecules expressed in tissue sections as 
T4 DNA ligase can assist in ligation reactions 
involving RNA strands (8). Moreover, immo- 
bilized padlock probes could be useful for pre- 
parative purposes, such as trapping circular 
target molecules from solution when screen- 
ing gene libraries. 

REFERENCES AND NOTES 



1. U. Landegren, R. Kaiser, J. Sanders, L Hood, Sci- 
ence 241, 1077 (1988); A. M. Alves and F. J. Carr, 
Nucleic Acids Res. 1 6, 8723 (1 988); F. Barany, Proc. 
Natl. Acad. Sci. USA. 88, 189 (1991). 

2. D. Y. Wu and R. B. Wallace, Gene 76 t 245 (1989). 

3. A. Jaschke, J. P. FOrste, D. Cech, V. A. Erdmann, 

I OUOIIQUI <«« « UJU, NJ-T, GO « \ t 

4. G. Prakash and E. T. Kool, J. Am. Chem. Soc. 114, 
3523 (1992); N. G. Dolinnaya et al., Nucleic Acids 
Res. 21,5403(1993). 

5. The upper faint bands observed in lanes 3 and 4 
probably represent small amounts of linear dimer 
molecules, appearing as a consequence of ligation 
of one end each of two different probe molecules. 
This material proved susceptible to exonuclease, 
digestion. The extra lower bands in these lanes 
were not reproducible between experiments. 
Small amounts of uncatenated, circular probes ap- 
pearing In lane 7 most likely were a consequence 
of endonuclease activity in the exonuclease prep- 
aration. With increasing amounts of exonuclease, 
catenated probes are lost and more free circular 
probes appear (M. Nilsson et at, unpublished 
data). 

6. J. R. Riordan et al., Science 245, 1066 (1989). 

7. H. F. Willard and J. S. Waye, Trends Genet 3, 192 
(1987). A. G. Matera and D. C. Ward, Hum. Mol. 
Genet 7, 535 (1992). A. Baldini et al., Am. J. Hum. 
Genet. 46, 784 (1990). 

8. N, P. Higgins and N. R. Cozzarelli, Methods Ehzymol. 
68, 50(1979). 

9. T. Manlatls, E. F. Fritsch, J. Sambrook, Molecular 
Cloning: A Laboratory Manual (Cold Spring Harbor 
Laboratory, Cold Spring Harbor, NY, 1982). 

10. C. Sund, J. Ylikoski, P. Hurskainen, M. Kwiatkowski, 
Afcycfeos. Nucleot. 7, 655 (1988). 

11. D. Pinkel er al., Proc. Natl. Acad. Sci. U.S.A. 85, 
9138(1988). 

1 2. We thank E. Johnsen for technical assistance and T. 
Hansson for molecular modeling. U. Pettersson of- 
fered critical comments on this manuscript. Support- 
ed by the Beijer, Procordia, and Borgstrom founda- 
tions; by NUTEK, the Technical and Medical Re- 
search Councils of Sweden; and by the Swedish 
Cancer Fund. 

18 July 1994; accepted 1 September 1994 



In 1990, a breast cancer susceptibility gene, 
known as BRCA J , was localized to chromo- 
some 1 7q ( 1 ). Subsequent studies demonstrat- 
ed that BRCAJ accounts for most families 
with multiple cases of both early-onset breast 
and ovarian cancer and about 45% of families 
with breast cancer only, but few if any families 
with both male and female breast cancer (2)* 
Several other genes can confer susceptibility 
to breast cancer. Germline mutations in the 
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p53 gene on chromosome 17p cause a wide 
range of neoplasms including early-onset 
breast cancer, sarcomas, brain tumors, leuke- 
mias, and adrenocortical cancer (3). Certain 
rare abnormalities of the androgen receptor 
appear to be associated with breast cancer in 
men (4), and epidemiological studies have 
suggested that heterozygotes for the ataxia 
telangiectasia, gene, AT, on chromosome 
llq are at elevated risk of breast cancer 
(5). However, mutations in p53 and AT 
can only be responsible for a small minor- 
ity of breast cancer families that are un- 
linked to BRCA1 (6). 

To localize other genes that predispose 
to breast cancer, we performed a genomic 
linkage search using 15 families that had 
multiple cases of early-onset breast cancer 
and that were not linked to BRCA1. These 
families were classified according to the 
number of cases of female breast cancer, 
male breast cancer, and ovarian cancer (Ta- 
ble 1). In addition to a negative lod score 
(logarithm of the likelihood ratio for link- 
age) with markers flanking BRCAJ, all but 
one of the families used for this study had at 
least one breast cancer case diagnosed be- 
fore age 50 that did not share a BRCA I 
haplotype with other breast cancer cases in 
the family. The exception, CRC 136, had 
an obligate sporadic case diagnosed at age 
53. Families were genotyped with polymor- 
phic microsatellite repeat markers (7, 8). 
Typing of the markers D13S260 and 
D13S263 provided provisional evidence for 
the presence of a susceptibility gene on 
chromosome 13, which was subsequently 
confirmed by analysis of additional poly- 
morphisms in the region. 
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SNP attack on complex traits 

Single nucleotide polymorphisms (SNPs) are major contributors to genetic varia- 
tion, comprising some 80% of all known polymorphisms, and their density in the 
human genome is estimated to be on average 1 per 1,000 base pairs. Although SNPs 
are mostly biallelic — and consequendy less informative than microsatellite markers 
— they are more frequent and mutational ly more stable, making them suitable for 
association studies in which linkage disequilibrium (LD) between markers and an 
unknown variant is used to map disease-causing mutations. In addition, because 
SNPs have only two alleles, they can be genotyped by a simple plus/minus assay 
rather than a length measurement, making them more amenable to automation. 

These are good reasons to develop SNPs as useful markers, but hardly sufficient 
to explain the momentum that the SNP movement has recently acquired, which 
stems from the hope that SNP- based approaches will lead to progress in the search 
for genetic variation associated with common diseases or sensitivity to drugs. At a 
recent meeting*, advances in SNP technology and SNP-based approaches to tackle 
complex traits as well as questions of human origin and prehistory were discussed. 
Frustrated with linkage analysis, which has had little success in identifying genes 
involved in determining complex traits, many geneticists have turned towards 
association studies which might be better suited to detecting genetic effects of low 
penetrance with higher resolution. For such studies, many more markers will be 
required — in addition to better statistical tools and high-throughput low-cost 
genotyping technology to analyse large marker sets in many samples. 

Increasing amounts of sequence data available in public and private databases, 
(within which SNPs can be discovered in silica, Pui-Yah Kwok, Macdonald Morris), 
efforts underway to re-sequence DNA stretches from several individuals, and the use 
of 'SNP discovery' technology (such as denaturing high performance liquid chromo- 
tography; Peter Underhill), have led to the rapid accumulation of catalogued SNPs. 
So far, no SNP has been patented, but a number of applications are pending (Christ- 
ian Stein) , and it seems likely that many will end up in proprietary collections. Even 
with the best tools, understanding complex traits and human variation will be a chal- 
lenge, to say the least; sharing resources will help. Two publicly available SNP data- 
bases as well as several SNP collections exist at present (see box) — and researchers are 
encouraged to submit any SNP that they discover. 

The technological and economic goal is accurate, easy, cheap and fast large-scale 
SNP genotyping. Several methods are currently being developed, and it is unclear 
which one(s) will turn out to be the best. Examples based on minisequencing on 
DNA arrays (Ann -Christine Syvanen, Andres Metspalu), dynamic allele-specific 
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hybridization (DASH, Anthony Brookes), microplate array diagonal gel 
electrophoresis (MADGE, Ian Day), pyrosequencing (Pal Nyren), 
oligonucleotide-specific ligation (according to Ed Southern, the most sen- 
sitive assay) as well as the Whitehead/Affymetrix SNP chips (Jian-Bing 
Fan) and the TaqMan system (Ken Livak) were discussed. All of them 
require target amplification of each SNP by PCR. Even in the light of 
encouraging progress in multiplexing PCR (Michelle Cargill), a large 
number of individual reactions is required and the cost is considerable 
(James Weber). Ideally, one would like to determine the genotype directly 
from genomic DNA. Methods based on the generation of small signal 
molecules by invasive cleavage followed by mass spectrometry (Timothy 
Griffin) or immobilized padlock probes and rolling-circle amplification 
(Ulf Landegren) might eventually eliminate the need for PCR. 

Apart from the challenges of generating SNP maps and efficient geno- 
typing, how easy will it be to determine which SNPs are suitable for a 
particular question and how best to analyse the data? In the absence of 
understanding what makes complex traits complex, classical mendelian 
concepts (two alleles, normal versus abnormal) are usually imposed onto 
a more complicated reality. Joseph Terwilliger warned that only if the 
genes underlying complex diseases have one wild-type and one (or one 
major) susceptibility allele — that is, when allelic heterogeneity is low — 
is statistical analysis likely to detect association of the causative allele (or 
linked markers) with the disease phenotype. Intuitively, more markers 
should allow increased accuracy, but in statistical reality, this also means 
larger samples will be necessary or the risk of obtaining false positive 
results will increase. Skeptical about the use of SNPs in disease genetics, 
Terwilliger is nonetheless enthusiastic about their potential use in population 
genetics and genetic epidemiology. By way of contrast Marta Blumenfeld and Nik 
Schork described a strategy by which they can overcome many of the statistical 
obstacles of SNP-based association studies. By sequencing DNA from a minimum 
of 100 individuals to establish SNP allele frequency, calculating LD strength in a 
region of interest prior to determining how many markers are needed, and 
analysing haplotypes (2-6 SNPs together) instead of individual markers, they 
have been able to identify new genes associated with complex traits — unfortu- 
nately the identities of the genes were not disclosed, and so proof of principle is 
yet to be provided. 

Although the jury is still out on whether SNPs will provide easy answers to 
complex questions, they are increasingly popular with disease and population 
geneticists. While the former mainly concentrate on SNPs within or close to 
genes, the latter often prefer markers outside of genes (to avoid selection) and in 
areas of the genome devoid of recombination. Several approaches using SNPs on 
the Y chromosome (Chris Tyler-Smith, Francesc Calafell) and in a low-recombi- 
nation interval on the X (Svante Paabo) provide interesting leads on human his- 
tory, as well as data about age, frequency and population distribution of SNPs. Of 
course, this is information directly relevant to disease geneticists, and underscores 
the need for more interaction between population and disease geneticists 
(Andrew Clark, Rosalind Harding) . Knowledge about population evolution and 
history will reveal suitable populations for genetic studies and aid in study design 
and interpretation of results. 

Time — or rather data — will tell whether SNPs live up to expectations. 
As Aravinda Chakravarti stated in his abstract, "Each genetic approach, 
considered either optimistic or pessimistic, has its underlying assumptions, 
Human geneticists have to begin to test these assumptions not by computer ►^^J 
simulations and theoretical arguments but by empirical observations". J^^B 



SNP databases 

; ■ HGBASE (http://hg base. interact 
iva.de) collects intragenic SNPs and 
contains approximately 2,700 entries. 
It is searchable by sequence and, at 
the moment, the only database 
where information can be deposited 
and retrieved. 

■ dbSNP (http://www.ncbi.nlm.nih. 
| gov/SNP/), ajoint effort by the 

NHGRI and the NCBI, is now accept- 
ing submissions; Its curators are still 
working on making content available; 
the database will be searchable by STS 
accession number and fully integrated 
with GenBank. 

SNP websites 

■ The MIT SNP database (http://www- 
genome. wi. mifcedu/SNP/human/ 

\ index.html) contains over 3,000 SNPs 

(approximately two thirds of them 
: mapped) and is searchable by genomic 
; region or internal STS identifier. ^ 

: ; i The Washy SIMP database (http:// 
| www.ibc.wiJStl.edu/SNP/) contains 
several hundred SNPs which are cur- 
rently being integrated into dbSNP. 
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Large-Scale Identification, Mapping, and 

Genotyping of Single-Nucleotide 
Polymorphisms in the Human Genome 

David G. Wang, JiarvBing Fan t Chia-Jen Siao, Anthony Berno, 
Peter Young, Ron Sapolsky, Ghassan Ghandour, 
Nancy Perkins, Ellen Winchester, Jessica Spencer, 
Leonid Kruglyak, Lincoln Stein, Linda Hsie, 
ThodorosTopaloglou, Earl Hubbell, Elizabeth Robinson, 
Michael Mittmann, Macdonald S. Morris, Naiping Shen, 
Dan Kilburn, John Rioux, Chad Nusbaum, Steve Rozen, 
Thomas J. Hudson, Robert Lipshutz,* Mark Chee, 
Eric S. Lander* 

Single-nucleotirJe polymorphisms {SNPs) are the most frequent type of variation in the 
human genome* and they provide powerful tools for a variety of medical genetic studies. 
In a large-scale survey for SNPs, 2.3 megabases of human genomic DNA was examined 
by a combination of gel- based sequencing and high-density variation -detection DNA 
chips. A total of 3241 candidate SNPs were identified. A genetic map was constructed 
showing the location of 2227 of these SNPs. Prototype genotyping chips were developed 
that allow simultaneous genotyping of 500 SNPs. The results provide a characterization 
of human diversity at the nucleotide level and demonstrate the feasibility of large-scale 
identification of human SNPs. 



Although the Human Genome Project still 
hap tremendous work ahead to produce the 
first complete reference sequence of the 
human chromosomes, attention is already 
focusing on vhe challenge of large-scale 
characteriiacion of the sequence variation 
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among individuals (I), This genetic diver- 
sity is of interest because it explains the 
basis of heritable variation in disease .sus- 
ceptibility, as well as harbors a record of 
human migrations. 

The most common type of human genet- 
ic variation is the SNP, a position at which 
two alternative ba?ctt occur at appreciable 
frequency (> 1%) in the human population. 
There has been growing recognition that 
large collections of mapped SNPs would 
provide a powerful tool for human genetic 
studies (J, 2). SNPs can serve *s genetic 
markers for identifying disease genes by 
linkage studies in families, linkage disequi- 
librium in isolated populations, association 
Analysis of patients and controk. and loss- 
of-heteToiygosky studies in tumors (/, 2), 



Although individual SNPs are le.vs informa- 
tive tlvm currently used genetic markers 
(3), they <*te more ;ibundanr. and have 
greater potential for automation (4, 5). 

Wc performed an initial survey to iden- 
tify SNPs by using conventional £cl-b#.«:d 
DNA sequencing to examine sequen.ee- 
rvi^ed sites (STSs) distributed across the 
hi.iman penomc. STSs arc shorr genomic 
sequences chat can he amplified from DNA 
samples by means of * corresponding poly- 
merase ch^in reaction (PC-R) assay. Fvom 
unions 24.568 STSs used in rhe construc- 
tion of a physical map of the human ge- 
nome at the Whitchend Institute for Rio- 
medical Research/MIT Center for Genome 
Research (6, 7), an initial collection of 
1139 STSs ww chosen (8) These STSs 
contained a total of 279 kb of genomic 
sequence (9). with one-third from random 
genomic sequence and two- third? from 3'- 
ends of expressed sequence taps (3' -i"-!STs) 
and primarily representing untranslated re- 
gions of penes Each STS was amplified 
from four sample? (JO): three individual 
samples and a pool of .1.0 individuals (there- 
by permitting allele frequencies to he esti- 
mated among 20 chromosome.*!). The PGR 
products were subjected to single-pass DNA 
sequencing based on fluorescent-dye prim- 
ers and gel electrophoresis; sequence traces 
were compared by a computer program fol- 
lowed by visual inspection (11). Candidate 
SNPs were declared when two alleles were 
seen among the three individuals, with both 
alleles present at a frequency greater than 
30% in the pooled sample. The term "can- 
didate SNP" is used because a subset of such 
apparent polymorphism? turn out to be se- 
quencing; artifacts, as discussed below. 

The survey identified 279 candidate 
SNPs, distributed across 239 of the STSs. 
This corresponds to a rate of one SNP per 
1001 base pairs (bp) screened and an ob- 
served nucleotide heterozygosity of M - 
3.96 X 1CT 4 (Table 1). Expressed sequences 
(3'-ESTs) showed a lower polymorphism 
rate than random genomic sequence (with 
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the difference falling short of statistical 
significance at P ^ 0.057, one-sided), con- 
sistent with greater constraint within genic 
sequences, The ratio of transitions to trans* 
versions was 2:1. Although the dinucleo- 
tide CpO makes up only about 2% of the 
sequence surveyed, nearly 25% of the SNP$ 
occurred at such sites with the substitution, 
almost always being O^T Cytosine residue? 
within CpG dinucleotidcs arc the most mu- 
table sites within the human genome, be- 
cause most arc methylated and can sponta- 
neously dcaminate to yield a thymidine res- 
idue (12). In addition to the single -base 
substitutions, 23 insertion-deletion polymor- 
phisms were also found (with all but eight 
involving a single base); corresponding to a 
frequency of one per 12 kb surveyed. 

Gel-based resequencing was satisfacto- 
ry fot the initial screen , but we sought a 
more streamlined approach for a larger 
scale SNP identification. One such ap- 
proach involves hybridization, to high- 
density DNA probe arrays (.13), Such 
"DNA chips' can be produced with paral- 
lel light-directed chemistry to synthase 
specified oligonucleotide prohes covalent- 
ly bound nr. defined locations on a gta.ss 
surface or "chip" (14). A target DNA sc* 



quence of length L can be screened for a 
polymorphism by hybridizing a biorin-la- 
beled sample to a variant detector array 
(VDA) of size SL (Fig. 1). For each position 
on both strands, the array has four 25-nucle- 
oti.de oligomer probes complementary to the 
sequence centered at the position. The four 
differ only in that, the central (13th) position 
is substituted by each of the four nucleotides. 
Homozygotes (AA) for the expected se- 
quence should hybridize more strongly to the 
perfectly complementary probe than to the 
three probes containing a central mismatch. 
The presence of an SNP would be expected 
to give rise to a different hybridization par- 
tem, with homorygotes (BB) showing strong 
hybridiiation to An alternative base and het- 
erozygotes (AB), showing strong hybridiza- 
tion to two probes. The VDA thus signals 
the presence of a sequence variation (by a 
change in the hybridization pattern) and» in 
many cases, indicate? the nature of the 
change (by a gain of signal ac a specific 
mismatch probe). VDAs have been used for 
mutation detection of small, well-studied 
DNA targets [such as 387 bp from the hi»- 
huta immuruvtafictency virus- 1 genome, 3 5 
kb from the breast cam ^-associated 
BRCAl gene, and 16.6 kb from the human 



Fig. 1. SNP screening on A 
chips. (A) Small portion of a 

v/DA for an STS hybridized gaattagtcaagcaggtcagatactattgtctoct 
with the oxpacred target se- 
quence. Chip features In 
*ach column ar^ comple- 
mentary to successive over- 
tapping 25 -nucleotide oligo- 
mer Sijosequcncss, with tne- 
centrai base substituted by 
A. C, G. or T in the four rows. 
Variations from the export- 
fid sequence can usually be 
detected by examination of the most intense signal in each column, fB) The same VDA was hybridized 
with sequence containing an SNP (A-*C) at position 19, The hybridisation srgnal Is now stronger at an 
alternative base at this position. It is also weaker ai the surrounding positions {tor example, positions 12 
to 1 8 and 20 to 26). because probes at these positions are oesigncd to ho complementary to the A aflelo 
At the SNP and mismatch with the C allele. 



B 

GAATTAGTCAAGCAGGTCC OAT ACTATTGTCTGCT 



mitochondrion (13, 1 5)1 in l^rge numbers of 
samples. In this setting, the normal hybrid- 
ization pattern can be characterired wirh 
precision and single-base substitutions de- 
tected with high accuracy. 

In this project, we used VDAs in a large- 
scale survey. A total of 16.725 STSs cover- 
ing 2 Mb of human DNA were selected, 
with one-rhird from random genomic se- 
quence arid two-thirds from 3'-ESTs. The 
survey used 149 distinct chip designs, each 
containing 150.000 to 300,000 features. 
The STSs wctc examined, in seven indi- 
viduals, representing about 14 Mb of 
genomic sequence. For each chip, the cor- 
responding STSs were amplified from an 
individual, pooled together, labeled with 
biorin, hybridized, and stained (26). and the 
resulting hybridization patterns were com- 
pared by a computer program followed by 
visual inspection (17). At each position, 
samples were classified as homozygous for 
the expected sequence, homozygous for an 
alternative sequence, or heterozygous. 

A collection of 2748 candidate SNPs 
were identified, conespondinj? to a rate of 
one per 721 bp surveyed and <»n observed 
nucleotide heterozygosity of 4.56 x I0 -4 
(Table 1). The number of STSs cnnr:uni.ng 
SNPs was 2299. The SNPs had a mean 
heterozygosity of.J3%, with the minor allele 
bavin" a mean frequency of 2S%. SNPs 
were found less often in ,3 '-EST* than in 
random genomic sequence (P < 0.023, one- 
sided), consistent with greater constraint in 
genie regions. 

The nucleotide heterozygosity rare was 
indistinguishable from die estimate ob- 
tained from geUbflsed sequencing (P > 
0.12, two-sided test)* as was the ratio of 
transitions to ttansversions and the propor- 
tion of SNPs occurring at CpG Jinucleo- 
rides. SNPs were detected at a higher fre- 
quency in the chip-based survey because 
more sample* weic- surveyed (sewn, versus 
three individuals). The observed increase of 
38.8% (1/721 versus 1/1 001) agreed closely 



Table 1. Results of SNP screening, 



Gof-haseo sequencing 



Chip -based detection 



Variable 




STSs from 


STSs trom 




STSs from 


STSs from 




All STSa 


3'-EST 


random genomic 


All STSs 


3'*EST 


random genomic 






sequences 


sequence 




sequences 


sequence 


No. of STSs screened 


1.139 


705 


434 


16.72S 


12.649 


4,076 


Total bases screened 


279,165 


186.524 


92.641 


1.981,030 


1.324,320 


656.710 


No. of candidate SNPs found 


279 


161 


118 


2,748 


1,749 


999 


SNP frequancy {«) 


V1001 




1/7 BS 


1/75! 1 


V757 


1/6S7 


Heterozygosity (W) (x 10"*) 


3.96 ± 0.38 


3.42 ± 0.43 


£5.04 * 0.07 


4.58 ±0.15 


4.36 ± 0.18 


5.0? ± 0,28 


No. of STSs containing SNPs 


239 


137 


102 


2.299 


1,515 


784 


% transitions among SNPs 


07% 


67% 


67% 


70% 


70% 


71% 


% SNPa occurring within CpG 


24% 


23% 


25% 


24% 


25% 


22% 


ft. based on H 


3 96 x 10"" 






4.5B X 10"" 


e, baseo on t< 


4.33 X I0- 4 






4.38 X 
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with expectation under c luteal population 
genetic theory (JVi). This resulr has impli- 
cations far the choici: of* sample mil- for an 
SNF survey (19). 

We isnm^vd the error nuc* in the gc\~ 
based and chip- based surveys- The fal*e- 
positivc rate was estimated by carefully con- 
firming candidate SNPs found in each sur- 
vey by usinc thorough multipass sequencing 
(20): 1.2% of 220 candidate SNPs found in 
the chip-based survey and 16% of 120 can- 
didate SNPs found by sinple-pass pel -based 
sequencing wore, false positives. The false- 
negavive r: ; m; was estimated by considering 
t\ subset of STSs rhat had been included in 
both surveys: these STSs yielded 55 SNFs 
(«U carefully confirmed to elimmftre false 
positive?), of which eight (15%) were 
missed by single-pass pel-kised rcscquenc- 
injt; and seven (1.3%) were missed by the 
chip -based survey. Many of the errors were 
due to random factors, in ihar they were 
eliminated simply by repcatmj: the oripinnl 
experiment. However, soiru: were reproduc- 
ible arrifaers r.har eon Id be eliminated only 
by chiinijine. the detection protocol (for ex- 
Hihple, by nsinpdye tcrminntor.* rather than 
dye primer?) in pel-based sequencing). The 
pci-hased sequencin/: and chip -based analy- 
sis had simitar raves of accuracy — with a 
false positive and false negative being found 
roughly every 5000 to 10.000 hasc*, or 
about 10% of the true SNF frequency. The 
accuracy largely reflects the particular im- 
plementation of the leehnplopie* in a hiph- 
rbrouKhpiir serrinc and could be fncrcased 
at the expense of assay optimization. 

Although the mo surveys yielded com- 
parable accuracy, the survey based on 
VDAs required considerably less laboratory 
work than pel-based resecpjeneini;. Both ap- 
proaches required amplifying target loci. 
The gel-based approach then required » 
sequencing reaction and electrophoresis on 
each individual locus, whereas the chip- 
based approach allowed target* i:or»ling 30 
kb to be pooled into a single labeling reac- 
tion and hybridized (21 ). 

The SNP collection from the two sur- 
veys was supplemented by two directed ap- 
proaches based on public databases. First, 
we collected reports from the literature of 
common variants in gene coding region?. 
We were able to confirm 120 of 143 cases 
rested by virtue of detecting two alleles in 
our screening panel; the remainder may be 
true polymorphisms but simply monomor- 
phic in the individuals tested. Second, the 
GenBank database contains multiple en- 
tries for .some ESTs. Such entries were com- 
pared to identify single-nucleottde differ- 
ences, which might reflect either common 
polymorphisms or sequencing errors in sin- 
gle-pass EST sequencing. We tested 200 
such apparent differences and confirmed 



Fig. 2. A portion of the SNTP genetic map (showing 
human chromosome 1). The full map is available 
on the Whitehead institute Web site (www. 
genoone.wi.mit.edu). Positions are based on ge- 
netic cfciances in centimorgana. Genetic posi- 
tions of SNP?; wem Inferred by localizing them 
relative to framework markers by RM mapping and 
then interpolating distances from contlrays (on the 
RH map) to centimorgans (on the genetic map), 
framework marker names are given in full. SNIP 
names are named with the prefix WIAF (for exam- 
ple, WIAF-1 7). but the prefix is dropped and only 
the number is shown in the figure. 



the presence of an SNP in 94 cases. These 
iwo Jirec.red approaches thus yielded an 
additional 214 SNPs. 

The project has thus identified 3241 
enndidnre SNPs io date. Confirmation (22) 
Was m} far been obtained for 1477 SNPs and 
is expected to yield -2900 true SNPs. All 
information about the SNPs ha* been de- 
posited on vhe Whitehead/MIT Center for 
Genome Research Weh sire (www.peni.wne. 
wi.mir.edu) ;»nd will be updated with results 
of additional surveys and confirmation 
i est*. Th^ information is also being depos- 
ited in the GenBank database. 

rw SNFs in he useful in human genetic 
studies, they must be assembled into maps 
showinp their chromosomal location. To 
create a thtrd-fjenermion map based on 
SNPs, we used whole -genome radiation-hy- 
brid (RH) mwppjnp ((), 7, 23). which infers 
the pnsirion of loci based on co-retention in 
a panel of hi/mm-on. hamster eel) line?;; it 
has become a primary method for constntcr.- 
ini: maps of the human penome (6, 7). 

The current Rl I map of the hum*n ge- 
nome is anchored by a scaffold of 1036 
genetic markers from an earlier genetic map 
consisting of simple sequence length poly- 
morphisms (SSLPs) (7). SNP* can be inte- 
grated with respect co the earlier genetic 
map by determining their position on the 
RH map. We have localized 1880 STSs, 
containing 2227 of the 3241 candidate 
SNPs, on the RH map and thereby relative 
to r.hc human genetic map (Fig. 2 and Table 
2). SNPs are not evenly distributed among 
chromosomes or within chromosomes be- 
cause most were derived from ESTs, which 
are known to have an uneven distribution. 
(6, 7). SNP-containing STSs are present at 
a mean spacing of 2.0 centimorgans (cM) 
across the genome (24) y and the map con- 
tains 58 intervals greater than 10 cM. The 
genetic distances on the map must be re- 
garded as approximate because they are 
based on interpolation from distances in the 
RH map. It will be desirable to reestimate 
these distances on the basis of direct linkage 
analysis in the CEPH families, as high- 
throughput genotyping for the complete 
SNP collection becomes feasible. 
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We next developed an efficient method 
for large-scale genotyping of SNPs based on 
extending the use of DN A chips from SNP 
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A/A homozygole A/C h«s!erozygo(e C/C homozygoto 

Fig. 3. Genotyping chlp9. (A) Schematic dlaangm 
of genotyplno, array tor an SNP. consisting of two 
vO As to study seven nucteottctes centered around 
the SNP. The top and bottom arrays are designed 
to b© complementary to ihe allelic sequences 
containing A and C, respectively. Probes perfecTly 
matching the A and C alleles ar© shown in gray 
and black, respectively. A genotyping arr^y for the 
complementary strand was 0130 used but i$ not 
shown. (8) Hybridization signal for a genotyping 
array probed with samples from \hree incftvirjuev? 
with respective genotypes AA, AC, and CC 



discovery to SNP genoryping (5). We syn- 
thesized genotyping chips containing 
"genotyping arrays" for each SNP to be 
tested. Each genotyping Array consists of 
two short VDAs conespemding to the two 
alternative alleles (Fig. 3). The presence of 
an allele should be reflected in strong hy- 
bridization to the corresponding rcsequenc- 
iog array. PCR assays were designed for the 
region containing each SNP (25), with the 
goal of being robust and mutually compat- 
ible; the amplification targets wete all small 
(typically, a few nucleotides around the 
polymorphic site), the primers all had sim- 
ilar calculated melting temperatures, and 
constant sequences were added to the 5'- 
erwfe of the forward and icvene primers to 
facilitate batch labeling of pooled PCR 
produces. Each assay was tested to ensure 
that ic amplified a single fragment from 
genomic DNA. 

The most complex genotyping chip 
rested contained genoryping arrays for 558 
c.W\d<KC SNPs identified in the chip* 
based survey. Initially, the 558 loci were 
separately amplified, pooled, labeled, and 
hybridized to the chip. To determine 
whether each locus could be reliably rend, 
we defined a formal detection rest: loci 
Passed if. for each of three individuals 
tinted, the expected ON A sequence could 
be successfully rend on both -strands for 
one or both alleles- In all, 98% of the loci 
passed rhU detection rest (with tin: re- 
maining 2% failing as a result of weak 
hybridisation ot cross-hyhr id i wtion V 

We nttst sought to decrease suhsfcuuMly 



the .sample preparation required to geno- 
type large numbers of SNPs, as required to 
perform a genome scan. We developed a 
protocol based on multiplex PCR in which 
phmeT pairs from many different loci ate 
combined in a single reaction (26) Al- 
though it is typically difficulr to combine 
many PCR assays, the approach worked 
well for our SNP assays: 92% of the 558 loci 
passed the detection rest when amplifica- 
tion was performed in 24 rets of —23 loci; 
90% passed when amplified in 12 sets of 
—46 loci; 85% passed when amplified in 6 
sets of —92 loci; and 50% passed when 
amplified in a single set of 558 loci- The 
success appears to have resulted from a 
combination of , factors, including the small 
size of chc amplification targets, optimita- 
tion of amplification conditions, and the 
presence of the constant sequence at the 
5 '-ends of the primers (27). It may be pos- 
sible to salvage the unsuccessful assays by 
grouping them into additional multiplex 
sets, or by redesigning the assays. 

Multiplex amplification, of sets of 46 loci 
was used in subsequent experiments because 
it decreased the number of reactions by a 
factor of 46 while allowing the vast majority 
(512/558) of loci ro K* assayed. The proce- 
dure was further rested in 59 individuals 
and was quite consistent: 96% of the 5 1 2 
loci could be successfully rend in 1.00% of 
individuals tested and rhe remainder in 
nearly alj individuals. 

We next developed a "enotyping algo- 
rithm for e;tt:h SNP. Loci were declared to 
pass it cluster tesc if the hybridization pin- 



s' 2. Chromosomal distribution of qenetlc markers. 
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terns seen in u rest sc.; oH9 individuals fell 
into disrinct clusters, corresponding to the 
Pi^iMc genotypes (28). Tlw.sc cluster* 
could then be used to as?ign genotypes for 
fnrrncr samples (29), 

The cluster tesv was applied to the ""-500 
candidate SNP* that worked well under 
multiplex amplification conditions: 75% 
passed the cluster test, and careful rese- 
quencing demonstrated thur. all such loci 
were rtuc polymorphisms The cluster test 
thus .provides reliahli: confirmation of an 
SNP. The remaining 25% failed the cluster 
rest, und rcsequencing revealed thai: half 
wen: false positives tn the SNP screen and 
Knit" were true polymorph isms (with the 
poor di.seriniinjirion on the chip typically 
due ro one allele hybridizing more weakly 
than the other)- Thus, 88% of the candi- 
date SNPs provt:d to he true polymor- 
phisms, and 86% of true SNPs parcel the 
cluster test. 

To rest the renrtwiucihility and accuracy 
of the jticnotypin/: nicihmi, we genotyped n 
set of 91 loci (passing the cluster ten) in 
three individuals by performing chip-na^cd 
{renotypiny on si,\ separate ocensinns over a 
Z-monih period. The correct gennrypes 
were independent Jy determined hy thor- 
ough qcl-hased resequencing. The penotyn- 
inr>-chrp assay assigned a genotype in 98% 
of ease* (1613/16J&), *nd this assignment 
proved correct in 99 9% (I611/!ol.3) of 
du:sc cases. The loci were also penoryped in 
two complete CEPH families. The geno- 
rypes were nor independently confirmed, 
but they were fully consistent with mende- 
lian ieRVOjffwi ton. 

For SNPs passing the cluster test, highly 
Accurate genotypes could thus he obtained 
with the simple design used here. For the 
remaining SNPs (14%), similar accuracy 
can likely he obtained hut may require op- 
timisation of the genotyping array design, 
depending on the locus (as shown in (5)], 

The SNP survey* provide data about 
human genetic diversity. Two classical mea- 
sures of diversity (JO) are H, the average 
heterozygosity per nucleotide, and K, the 
proportion of sites harboring a variation. H 
does not depend on sample size, whereas K 
increases with the number of genomes sur- 
veyed. For a population at equilibrium, the. 
neutral theory of evolution relates H and K 
to the classical population genetic parame- 
ter 0 = where N e is the effective 
population size and u* is the mutation rate 
per nucleotide. (8 can be thought of as 
twice the number of new mutations per 
generation arising in a population with size 
N € .) Specifically, H ~ e and K 0 + 
2-' + r 1 + . + (n - 1)-'], provided 
that 6 is small From these equations, one 
can estimate- 8 based on H or K. 

The human population is not at equi- 



librium, bur rarher underwent a rapid pop- 
ulation expansion in the !>vm 100,000 to 
200,000 yc<?r5. Such population explosions 
vend to suppress the effects of genetic drift 
:>nd thus preserve the distribution of com- 
mon alleles and the. value of 6. Accord- 
ingly, the value of G is relevant to the 
nni.i:srra) human population, before its re- 
cent expansion. 

The four estimates of fl derived from II 
and K lot the two surveys »tc all roughly ft 

4 x 10"" (Table 1). Assuming a muta- 
tion frequency of' u. *•= 10~ p to 10"*, r.M* 
would suggest an effective population sire of 
N c *~ 10 1 to I0 r \ which seems reasonable 
for the ancestral population preceding the 
explosion in the last 100,000 years OS). 
Strictly speaking, these estimates apply only 
to the turope>in population, from which ;ill 
sample* were drawn. However, 3 prelimi- 
nary survey of a more diverse sample of 31 
individuals representing all major rad>il 
groups yielded a vylue «f 9 thai is only 30% 
larger (26), consistent with the ide« that 
human variation occurs primarily within 
rather rh;m between rrtcml groups (32). 

The resources repented here represent 
only a first srep r/»w;«rd a dense SNP m^p of 
the human genome. The genetic map 
should already he useful for family-based 
linkage studies, given rhe average spacing 
(2 cM) and Average hctcvorygosity (34%) of 
the markers. (The heterozygosity applies to 
the European-derived samples studied here, 
bur a preliminary survey of ^180 of the 
SNPs shows that most are also polymorphic 
in other groups,) k still remnin*. to develop 
a suitable genorypir\g system. Such >is ? 
2000-SNF ^enotyping chip. 

L^rge -scale screening for human varia- 
tion is clearly feasible. Someday it may be- 
come possible i:o screen entire human, ge- 
nomes. In the nearer term, a key goal will 
be to extend SNP discovery to the protein 
coding regions of all human genes (roughly 
120 Mb of sequence, only about 40 times 
more than the current study) in order to 
catalog the common variants that may ex- 
plain susceptibility to common, genetic 
traits and diseases (J). 
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methytammonivm chtorlde, 10 mM tris-HCi (pH 7.9), 
1 mM EDTA. 0 01% Triton x-100, herrino sperm 
DNA (100 ^ml). and 200 pM control oligomer) at 
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wlm fcx SSPfT at 22*C, and sialned ax room tem- 
perature with staining solution |atreptav1dln R-phy* 
crjerythrtn (2 M-o/m!) tMoJecutar Probes) and eeety- 
fated bovine serum albumin mo^rrt!) tn 6** 
SSPET1 for 8 min. After tnoy ww o stained, me chip* 
were ws**>ed 1 0 limes with 6k SSPCT 31 22 A C on 9 
(Uo'tCswoftetatiorvtAffymotrixj. Hybridization to the 
chio was ootoctofl ov using a confocai chip scanner 

(HP/ArfymWlx) with a resolution of 40 to 80 pl*efft 
per feature and a 560- nm filter. 
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substitutions on both strands) tn individual / end 
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39.3% when the number ot genomes tr, tncr?^?eo 
from 6 (irt the gBl-based tt/rveyj to M (In th* chlp- 
o&sod dunM This doraes well with the observed 
«ncrefls?. ot 

iQ. a relatively srnaii sample sao tiufftcay to capture 
much of the common varyi'ion. Tne sample si/a of 
m hv. a ,w* chance of delecting an i\Jfcie with a 
frequency of fj%. Doubilnn. rnn proportion ot variont 
sit'js ide/uified would renulm Increasing me ro.irrw 
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ume containing 100 ng of human genomic ONA, 0.1 
to 0.2 iiM of each orlmor. 1 Jrtit of AmpfiTaq Gold 
(Perkfn-Elmar). 1 mMdeoxynucreotldatriphoaphatea 
(dNTPs), lOmM irts-HQ iDHe.3). 50 mM KO, 5mM 
MgCl 2 end 0.001 % gelatin. Themnocydlng waa per- 
form on a Tetrad fM> Rerxwwch), with initial dena' 
tutation at 96"C Tor 1 0 mtn fottowod by 30 cycles ot 
depuration H t 96*0 for 30 e. primer annealing at 
55*C for 2 min, and primer cuncnsion at 85"C for 2 
mln. After 30 cycles, a final extension reaction wee 
carried out at 65*C for 5 mtn. Because the resulttng 
PCR products were smaft. if wos unnecessary to 
fragment thorn (as was done Tor tho STSs tn tho SNP 
screen). The PCR products ware Then labeled wtth 
btetln In a atarrdard PCR reaction, by using T? and 
T3 prtmera with btottn labels at thBlr 5" -ends. The 
reaction was performed wilh 1 of template DNA. 
0. 1 to 0.2 m-M labeled primer. 1 unit of AmpKTeq Gold 
(Perkin-Bmer), 100 jtM dNTPs, 10 mM trft-HCl (pH 
8.3). 60 mM KCI. 1.5 mM MgCt^ and 0.001% gela- 
tin. Thermocyctlng was p^orrned with initial drjna- 
tuiallon at S6"C *ot 10 min totowed by 25 cyctaa of 
dercuuratton at 36*C tor 30 s» primer annealing at 
52^C for 1 min, and primer extension at 72*C for 1 
mln. After 25 cyclas. a 6naJ extension reaction w3? 
cerried out at 72*C for 5 min. The PCR products 
from me various muwptex reactions for an individual 
wore msn powcd logsiner. Ono-toiMh of the pooled 
s?nw wac denatured ^no usnd for chip hybridiza- 
tion. Chips were hytyidtod. washco, ptamod and 
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comp.?rlnr| rhr obwrvwi hybrtCl^ation ^iynal (0 rhe 
axpected aignfllfi for the two VDA,?. The voluOS v Al 
for the 39 individuals lie in rh^ tnt(*rv,^! fi.-i] otVJ 
slioulct ideally cluster near 0, OTj. snd I 0, but other 
patterns might occu because ol dlffwftnc^f- In hy- 
UritftiaiH>n intcn^ily hatw©ut» th© two alleles. The vnJ- 
U05 wem OCllmally Clustered (33) with The MQD- 



The cellt.ili.ir properties of netmm* ;?,tc 
moJol^tcd by ? number of exrrinsu-. sigr\?ls>, 
incU«Ji.T\g synoptic accivicy, nui.irof.ropKic 
frtLVi)^, virui hurmrines. These slj:n«ling sys- 
rcms »lrcr the inrrnccUular cmiccntrrition? 
of sc:c.(ind mcsfenpers >uctt ;»s calcium and 
cyclic nuckoridos. ami these small mota- 
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RasGRP, a Ras Guanyl Nucleotide- 
Releasing Protein with Calcium- and 
Diacylglycerol-Binding Motifs 

Julius 0. Ebinu, Dreil A. Bottorff, Edmond Y. W. Chan, 
Stacey L. Stang, Robert J. Dunn, James C Stone* 

RasGRP, a guanyl nucieotide-releasing protein for the small guanosine triphosphatase 
Ras. was characterized. Besides the catalytic domain, RasGRP has an atypical pair of 
"EF hands" that bind calcium and a diacyiglycerol (DAG)-bindlng domain. RasGRP 
activated Ras and caused transformation in fibroblasts. A DAG analog caused sustained 
activation of Ras-Erk signaling and changes in cell morphology. Signaling was associ- 
ated with partitioning of RasGRP protein into the membrane traction. Sustained ligand- 
induced signaling and membrane partitioning were absent when the DAG-binding do- 
main was deleted, RasGRP is expressed in the nervous system, where it may couple 
changes In DAG and possibly calcium concentrations to Ras activation. 
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